Exploring the Initial Impact of COVID-19 Sentiment on US Stock Market Using Big Data

: This study explores the initial impact of COVID-19 sentiment on US stock market using big data. Using the Daily News Sentiment Index (DNSI) and Google Trends data on coronavirus-related searches, this study investigates the correlation between COVID-19 sentiment and 11 select sector indices of the Unites States (US) stock market over the period from 21st of January 2020 to 20th of May 2020. While extensive research on sentiment analysis for predicting stock market movement use tweeter data, not much has used DNSI or Google Trends data. In addition, this study examines whether changes in DNSI predict US industry returns di ﬀ erently by estimating the time series regression model with excess returns of industry as the dependent variable. The excess returns are obtained from the Fama-French three factor model. The results of this study o ﬀ er a comprehensive view of the initial impact of COVID-19 sentiment on the US stock market by industry and furthermore suggests the strategic investment planning considering the time lag perspectives by visualizing changes in the correlation level by time lag di ﬀ erences.


Introduction
The COVID-19 pandemic is one of the most economically costly pandemics in recent history.The current COVID-19 pandemic appears to differ fundamentally from past epidemics such as SARS in 2003 and Ebola from 2014 to 2016.The coronavirus leads much faster global transmission due to closer international integration and possibility of transmission through carriers without symptoms.As the COVID-19 outbreak rapidly spreads around the world, many people are becoming more sensitive to the news and more often doing Google searches for coronavirus.News as well as Google searches for coronavirus play a key role in staying people informed about the current state of the crisis, influencing investors to make decisions in the stock market.As such, news and Google searches for coronavirus may have an impact on stock market sentiment and asset prices.It is expected that the COVID-19 pandemic will force new ways of asset management and drive preference for short-term agile investment decision unless it ends.
It is widely acknowledged that stock market investors need high-quality data to make informed decisions.Particularly, in times of market crisis, investors need technology to obtain timely and accurate data.Using the high-quality data, investors are able to do fast analysis and decision making in the asset management process and then react quickly to a volatile market condition.Any positive or negative sentiment of public related to stock market crisis can have a ripple effect on decision making by investors in stock markets.The combination of available information at stock markets and real-time big data from non-traditional sources like news sentiment data or Google Trends data could further enhance investors' understanding of the heterogeneous impact of Covid-19 shock and their ability to develop adequate responses.The real-time big data could also help overcome limitations in official data, such as low-quality, limited coverage, or reporting lags that in some cases could be substantial.
To sustain a competitive advantage during market crisis periods, stock market investors require not only understanding the nature of market crisis such as timing, strength, and variability, but also a strategic investment decision that can realize positive returns or minimize the loss due to a shock.Once a shock has substantially affected stock markets, investors analyze data from the past few months to find out potential lessons from the market crisis and develop an indicator to respond faster and smarter to market volatility.Then investors can better protect their portfolios from market shocks when such a shock occurs again.A number of previous studies use sentiment analysis for predicting stock market movement using big data such as tweeter data or other social media data .
This research aims to explore the initial impact of COVID-19 sentiment on the US stock market by industry using big data.Although the outbreak of COVID-19 is impacting almost all industries and sectors worldwide, it is apparent that consumer spending in sectors like leisure and hospitality is falling dramatically due to shelter-in-place and other social distancing measures being imposed across the country.As such, it is expected that the degree of COVID-19 sentiment impact would vary by industry.Therefore, we investigate the correlation of COVID-19 sentiment with 11 select sector indices of the US stock market.The COVID-19 sentiment is measured by Daily News Sentiment Index (DNSI) and Google Trends big data on coronavirus-related searches.The DNSI is a high frequency measure of economic sentiment based on lexical analysis of economics-related news articles from 16 major US newspapers.Specifically, this study examines DNSI and Google searches for five terms related to coronavirus and economy in the US as well as worldwide over the period from 21st of January 2020 to 20th of May 2020.Five terms related to coronavirus and economy include "coronavirus", "laid off", "unemployment", "recession", and "vaccine".Then this study examines the significance of relationship between COVID-19 sentiment and 11 select sector indices of the US stock market for offering a comprehensive view of the initial impact of COVID-19 sentiment on the US stock market by industry and furthermore suggests the strategic investment planning considering the time lag perspectives by visualizing changes in the correlation level by time lag differences.In addition, this study investigates whether changes in DNSI predict US industry returns differently by estimating the time series regression model with excess returns of industry as the dependent variable.The excess returns are obtained from Fama-French three factor model.
This study is at the forefront of research on relationship between COVID-19 sentiment and the US stock market.As the ongoing COVID-19 pandemic was confirmed to have reached the US in January 2020, there is little research on the impact of COVID-19 on the US stock market.While extensive research on sentiment analysis for predicting stock market movement use tweeter data, not much has used DNSI or Google Trends data.To our best knowledge, this study is the first to use DNSI or Google Trends data to explore the initial impact of COVID-19 sentiment on US stock market by industry.
The rest of this paper is organized as follows.Section 2 describes previous studies on sentiment analysis for stock market.Section 3 presents the data and methodology for our analysis.The empirical results are presented and interpreted in Section 4. Concluding remarks are presented in Section 5.

Related Studies
Sentiment analysis has been widely used in many applications such as product recommendations, healthcare, politics, and in surveillance [22].People's sentiment that relates to feelings, attitudes, emotions, and opinions expressed in a large amount of social media data is found to play a key role in gauging the opinions of investors [13].Tetlock [19] systematically explored the interaction between media content and stock market activity using daily content from The Wall Street Journal column from 1984 to 1999.They showed that news media content predicts movements in market prices and trading volume.Providing further support for Tetlock's [19] evidence, Garcia [8] found that the predictability of stock returns using news' content is concentrated in recessions.They used the fraction of positive and negative words in two columns of financial news from The New York Times as a proxy for public sentiment.
In addition to the studies using news media contents from popular newspapers, a number of studies have used social media data for sentiment analysis to predict stock market movement.The most well-known study in this area is by Bollen et al. [3].Using Twitter data, they investigated whether the collective mood states of public are correlated to the Dow Jones Industrial Index.They used a fuzzy neural network for their prediction and showed significant correlation between public mood states in twitter and the Dow Jones Industrial Index.Followed by this study, Mittal and Goel [13] mainly used the profile of mood states (POMS) questionnaire to capture public mood and predicted the Dow Jones Industrial Average trend via fuzzy neural network.Their results showed a prediction accuracy rate of 87%.Zhang [18] examined correlation between stock price and significant keywords in tweets.They showed a strong negative correlation between mood states like hope, fear and worry in tweets with the Dow Jones Average Index.Lima et al. [12] improved on the accuracy of predicting stock trends using support vector machine (SVM) by considering an overall public sentiment attribute.They showed that a day on which the number of positive tweets exceeded the number of negative tweets is indicative of an overall positive public mood regarding the stock.Using tweeter data Pagolu et al. [15] also showed a strong correlation between public opinion and the Dow Jones Industrial Index.Kordonis et al. [10] employed simple metrics such as rate of change in opening and closing prices along with sentiment scores.They used an SVM model to predict stock market movement and found a significant effect of the changes in public sentiment on the market prices.Bharathi and Geetha [2] combined SENSEX points of the Indian stock market and really simple syndication (RSS) feeds for effective prediction of stock market.Their results showed that the sentiment analysis of RSS news feeds has an impact on stock market values.Pasupulety et al. [16] employed sentiment analysis to evaluate the effectiveness of considering the public opinion of a company.They used a trained Word2Vec model and classified company specific hash-tagged posts from Twitter as positive or negative.Their ensemble model is found to perform better than the constituent models and to depend highly on the nature and size of the training.

Data and Summary Statistics
This study uses 11 select sector indices provided by S&P Global over the period from 21st of January 2020 to 20th of May 2020 (The index data is available at https://us.spindices.com/index-finder/).The sample period coincides with increasing news coverage of the COVID-19 and Google searches for terms related to coronavirus.The big data used in this study includes the Daily News Sentiment Index (DNSI) and Google Trends data on coronavirus search.The DNSI is a high frequency measure of economic sentiment based on lexical analysis of economics-related news articles from 16 major US newspapers compiled by the news aggregator service LexisNexis.Refer to Buckman et al. [23] and Shapiro et al. [24] for detailed process of constructing the DNSI.The DNSI is developed in such a way that higher values of index indicate more positive sentiment.The DNSI is found to move downward with key historical events that have a significant impact on economic outcomes and financial markets, such as the start of the first Gulf War in August 1990, the Russian financial crisis in August 1999, the terrorist attacks of September 11, 2001, the Lehman Brothers bankruptcy in September 2008, and federal government shutdown in October 2013.
Using Google Trends big data, we first search terms that are related to the coronavirus and from those terms, we select five terms that are most relevant to the economy.We then obtain information on the relative frequency of Google searches for five terms related to coronavirus and economy in the US as well as worldwide: "coronavirus", "laid off", "unemployment", "recession", and "vaccine".words in Google Trends.In this sense, these search terms capture peoples' general interest in the coronavirus as well as its potential impact on the economy.Panel A of Table 1 shows the sample distribution of the 11 select sector indices including communication services, consumer discretionary, consumer staples, energy, financial, healthcare, industrial, information technology, materials, real estate, and utilities.
The sample distribution of DNSI is shown in Panel B of Table 1.Table 1 also reports the sample statistic for four months prior to the sample period (time of NO COVID-19 from 21st of September 2019 to 20th of January 2020) to compare them with those in the sample period (time of COVID-19).As expected, the descriptive statistics of sector indices in Table 1 reveal that mean value of the S&P 500 index as well as the most select sector indices except for health care and information technology have fallen as the coronavirus has begun spreading to the US.It is interesting to note that health care and information technology select sector indices have increased in the time of COVID-19.It is apparent that consumer spending in some sectors has been falling dramatically due to social distancing measures being imposed across the country.On the contrary, consumer spending in information technology has been more likely to increase for the same reason.In addition, the value of healthcare-related stocks has appeared to increase because the development of coronavirus vaccines and treatments has been urgently needed.All sector indices show a significant increase in volatility, which indicates that every sector in the US stock market has been subject to more variation due to the emergence of coronavirus.The descriptive statistics of DNSI in Panel B of Table 1 show a decrease in mean value and an increase in volatility with the emergence of coronavirus.This result implies that coronavirus has brought about a sharp decline in news sentiment.In this sense, the DNSI over the sample period indicates peoples coronavirus sentiment [23].
Figure 1 shows relative frequency of Google searches for five terms related to coronavirus and economy over the period from 21st of January 2020 to 20th of May 2020.The relative frequency of US and worldwide Google searches for five terms are found to have a sharp increase in March and similar distributions.The relative frequencies of Google searches for "coronavirus", "laid off", and "recession" have declined significantly in May while those for "unemployment" and "vaccine" have not decreased that much.

Methodology
This study conducts a one-sided t-test to examine the significance of the relationship between COVID-19 sentiment and US stock markets by industry.Buckman et al. [23] showed that news articles mentioning the coronavirus or COVID-19 began around 20st of January 2020 and then rapidly increased to reach an astounding 95% of economics-related news articles by late March.This figure clearly shows that the decline in DNSI through mid-March coincided with the increased coverage of COVID-19.
As such the DNSI can be used to assess the COVID-19 sentiment like the Google searches for terms related to coronavirus and economy over the sample period.
The links between the COVID-19 sentiment and stock market aspect by industry are examined to identify how the effects of public mood on the coronavirus differ by industry.For this, we establish the following hypotheses for DNSI (H D 1 ) and Google Trends data (H G 1 ).H D 1 : When people come across positive (negative) economic news, investors are more (less) likely to invest in US stock market.
Or when the DNSI increases (decreases), the US stock market tends to increase (decreases).
H G 1 : When people search terms related to coronavirus more (less) frequently, investors are less (more) likely to invest in US stock market.
Or when the frequency of Google searches for coronavirus increases (decreases), the US stock market tends to decrease (increase).
Using the correlations between DNSI and 11 select sector indices (ρ), the following null and alternative hypotheses are investigated to test the hypothesis H D 1 .
On the contrary, using the correlations between Google Trends data and 11 select sector indices, the following null and alternative hypotheses are investigated to test the hypothesis H G 1 .
This study repeats the test using values of big data with lag 0, lag 1, lag 2, and lag 3 to investigate if the effects of the COVID-19 sentiment occur with a time difference.This additional analysis addresses a distinct feature that can benefit a time-sensitive strategic investment decision.
We also examine whether changes in DNSI predicts US industry returns differently.For this, we estimate the time series regression model with excess returns of industry as the dependent variable, which is motivated by Garcia [8] and Bannigidadmath [25].We generate the excess returns of industry using Fama-French three factor model (The Fama-French three factors are obtained from the online data library of Kenneth French, http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library. html.)[26] and control serial correlation and volatility of the returns by estimating the following model: where R t is the excess returns for the 11 select sector indices and ∆DNSI represents changes in DNSI.
The coefficient β i specifies the effect of increase in ∆DNSI on an increase in R t and ε t is the error.A Newey-West [27] procedure with 12 lags is used to correct standard errors for autocorrelation and heteroskedasticity.

Significance of DNSI on US Stock Market by Industry
Table 2 shows test results for DNSI and 11 select sector indices with correlations and t-statistics for lag 0, lag1, lag2, and lag 3 DNSI by industry.Test results in Table 2 show that the DNSI is significantly positively related to all select sector indices at the 1% significant level, regardless of time lag.The positive correlation indicates that investors are more (less) likely to invest in US stock market when people come across positive (negative) economic news.Financial sector index shows the highest correlation over 0.95, and energy and industrial select sector indices exhibit high correlation over 0.9 with lag 0, lag1, lag2, and lag 3 DNSI.These results imply that US stock investors in financial, energy, and industrial sectors tend to be more sensitive to daily economic news.On the other hand, health care select sector index shows the lowest correlation below 0.6 regardless of time lag of DNSI.
Consumer discretionary and information technology select sector indices are found to have correlation of about 0.7 while communication services, consumer staples, materials, real estate, and utility select sector indices show correlation of about 0.8 with DNSI. Figure 2 shows changes in the relationship level between DNSI and select sector indices with mean values of correlations by time lag difference.It is interesting to note from Figure 2 that the significance level of correlation decreases as the time lag of DNSI increases for most select sector indices except for financial, real estate and utility sectors, implying that COVID-19 sentiment seems to have an immediate impact on the day's investment.

Significance of COVI-19 Related Google Searches on US Stock Market by Industry
Tables A1-A5 in Appendix A show test results with relative frequencies of Google searches for five terms related to coronavirus and economy for 11 select sector indices.Tables report correlations and t-statistics for lag 0, lag1, lag2, and lag 3 relative frequencies of Google searches by industry.Figure 3 presents changes in the magnitude of negative correlation between US Google searches for five terms and select sector indices with mean values of correlations by time lag difference.
Tables A1-A5 in Appendix show test results with relative frequencies of Google searches for five terms related to coronavirus and economy for 11 select sector indices.Tables report correlations and t-statistics for lag 0, lag1, lag2, and lag 3 relative frequencies of Google searches by industry.Figure 3   Test results in Tables A1-A5 show that US and worldwide Google searches for five terms are significantly negatively related to all select sector indices at the 1% significant level regardless of time lag except for health care index.The test results for US (worldwide) lag 2 (lag 1) Google searches of "unemployment" show significant relationship with health care select sector index at the 5% significance level, but the health care index is found to have no significant relationship with US (worldwide) lag 3 (lag 2 and lag 3) Google searches.The negative correlation indicates that investors are less (more) likely to invest in US stock market when people search terms related to coronavirus more (less) frequently.
The significance of the relationship between Google searches of "coronavirus" and 11 select sector indices is shown in Table A1.The negative correlation level between "coronavirus" Google searches and 11 select sector indices ranges from 0.65 to 0.95, which is the highest range of the correlations among Google searches of five terms under consideration in this paper.Communication services, consumer discretionary, and information technology sectors show a high level of correlation while financial and utility sectors show a low level of correlation with "coronavirus" Google searches across the time lags.The "coronavirus" plot in Figure 3 illustrates that while most select sector indices show stable relationship across the time lags, information technology and health care (real estate and utility) sectors experienced decreases (increases) in the relationship at lag 3.This result implies that the frequency of Google searches of "coronavirus" on a day has higher impact on investment in information technology and health care sectors on the same day than a few days later.
The significance of the relationship between Google searches of "laid off" and 11 select sector indices is shown in Table A2.The negative correlation level between "laid off" Google searches and 11 select sector indices ranges from 0.45 to 0.90.Communication services and consumer discretionary sectors show a high level of correlation while financial and utility sectors show a low level of correlation with "laid off" Google searches across the time lags.The "laid off" plot in Figure 3 illustrates that the significance level of correlation decreases as the time lag increases for all select sector indices.Particularly, health care sector experiences noticeable decreases at lag 3.These results indicate that the level of peoples' interest in "laid off" on a day has higher impact on investment in most sectors on the same day than a few days later.
The significance of the relationship between Google searches of "unemployment" and 11 select sector indices is shown in Table A3.The negative correlation level between "unemployment" Google searches and 11 select sector indices ranges from 0.30 to 0.80.Financial and industrials sectors show a high level of correlation while health care and information technology sectors show a low level of correlation with "unemployment" Google searches across the time lags.It is interesting to note from Panel A and B in Table A3 that there is no significant relationship between the health care index and US (worldwide) lag 3 (lag 2 and lag 3) Google searches.Like the case of "laid off" Google searches, the "unemployment" plot in Figure 3 illustrates that the significance level of correlation decreases as the time lag increases for all select sector indices and the order of significance level for 11 industries is generally maintained across the time lags.These results indicate that the level of peoples' concern in "unemployment" on a day has higher impact on investment in most sectors on the same day than a few days later.
The significance of the relationship between Google searches of "recession" and 11 select sector indices is shown in Table A4.The negative correlation level between "recession" Google searches and 11 select sector indices ranges from 0.45 to 0.85.Consumer discretionary, health care, and information technology sectors show a high level of correlation while financial and utility sectors show a low level of correlation with "recession" Google searches across the time lags.The "recession" plot in Figure 3 illustrates that a significant level of correlation increases as the time lag increases for most select sector indices.Particularly, health care, consumer staples, real estate, and utility sectors experience noticeable increases at lag 3.These results indicate that the frequency of Google searches of "recession" on a day has higher impact on investment in most sectors a few days later than on the same day.In other words, the level of peoples' concern in "recession" shows a delayed impact on stock market rather an immediate impact.
The significance of the relationship between Google searches of "vaccine" and 11 select sector indices is shown in Table A5.The negative correlation level between "vaccine" Google searches and 11 select sector indices ranges from 0.60 to 0.90.Energy, financial, and industrial sectors show high level of correlation while health care and information technology sectors show low level of correlation with "vaccine" Google searches across the time lags.The "vaccine" plot in Figure 3 illustrates that the significance level of correlation increases as the time lag increases for most select sector indices.Particularly, health care and real estate sectors experience noticeable increases at lag 3.This result implies that the frequency of Google searches of "vaccine" on a day has higher impact on investment in health care and real estate sectors a few days later than on the same day.Like the case of "recession" Google searches, the level of peoples' interest in "vaccine" shows a delayed impact on stock market rather an immediate impact.

Results from Time Series Regression Models
The results from time series regression model with changes in DNSI as the predictor are reported in Table 3. Table 3 reports estimated coefficients and p-values for lag 0, lag1, lag2, and lag 3 changes in DNSI from the time series regression model (3).The last column reports coefficient of determination (R 2 ).We find that lag 0 and lag 1 changes in DNSI positively predict returns for energy and financial while lag 0 and lag 3 changes in DNSI positively predict returns for industrials at the 5% significance level.The industry of consumer services is found to have significantly positive coefficient of lag 0 changes in DNSI.However, there is no evidence of predictability for other industries.

Discussion
By investigating the links between 11 select sector indices and COVID-19 sentiment measured by DNSI and relative frequency of Google searches of five terms related to coronavirus and economy, this study provides a comprehensive overview of the initial impact of the COVID-19 sentiment on US stock market and how it differs by industry.Based on the empirical test results of correlation analysis, 11 industries are classified into three groups for each measure of the COVID-19 sentiment to distinguish the impact of the COVID-19 sentiment on US stock market.Table 4 summarizes industry classification from empirical results for each measure of the COVID-19 sentiment.Among 11 industries, communication services, consumer discretionary, industrial, energy, and material sectors are found to be included in the high-or middle-level correlation group while utility sector is included in the middle-or low-level correlation group.Particularly, financial, information technology, and health care sectors are found to be included in all three groups, whereas real estate and consumer staples sectors are only included in the middle-level correlation group.The results in Table 4 reveal the distinct effects of the COVID-19 sentiment across various industries and the comparison analysis of COVID-19 sentiment by time lag difference shows its noticeable effect across various industries.In addition, results from the time series regression model show that COVID-19 sentiment measured by DNSI positively predicts industry returns including communication services, energy, financial, and industrials.On the contrary, there is no evidence of predictability for other industry returns, implying difference in industry return predictability by COVID-19 sentiment among 11 industries.The economic intuition under these results is that information diffusion is different across industries, leading to different predictability of returns across industries [28,29].
The information from the empirical results is prudent for strategic investors as their primary interests lie in identifying industry that is closely related to the COVID-19 sentiment or rarely related to it.It may help fund managers to adjust their portfolio risk exposure by trading their stocks included in industries that are significantly responsive to COVID-19 sentiment or those that are not.It also offers valuable implication to forward-looking US stock market investors in the time of COVID-19.

Conclusions
Exploring stock market movement has attracted many researchers in multiple disciplines including finance, economics, computer science, statistics, and operations research.Recently, many researchers have shown that online information obtained from the public domain such as news stories from the mainstream media and social media discussion such as tweets can have a significant effect on decision making by stock market investors.Particularly, in times of market crisis, any positive or negative sentiment of public related to stock market crisis can have a ripple effect on decision making by investors in stock markets.
While the COVID-19 pandemic has not ended, this study explores the initial impact of the COVID-19 sentiment on the US stock market using DNSI and Google Trends big data on coronavirus-related searches.This study offers a comprehensive view of the initial impact of COVID-19 sentiment on the US stock market by industry by investigating the correlation between COVID-19 sentiment and 11 select sector indices as well as industry return predictability by COVID-19 sentiment.The empirical results reveal the distinct effects of the COVID-19 sentiment across various industries: Communication services, consumer discretionary, industrial, energy, and material sectors are classified in the high-or middle-level correlation group while utility sector is classified in the middle-or low-level correlation group.Financial, information technology, and health care sectors are classified in all three groups, whereas real estate and consumer staples sectors are only included in the middle-level correlation group.They also suggest the strategic investment planning considering the time lag perspectives by visualizing changes in the correlation level by time lag differences.In addition, results from the time series regression model demonstrate industry return predictability by DSNI for communication services, energy, financial, and industrials.
To sustain competitive advantage in the time of the COVID-19 pandemic, stock market investors require not only understanding the nature of market crisis caused by the sudden shock, but also a strategic investment decision that can realize positive returns or minimize the loss due to the shock.When stock markets suffer from a sudden and unprecedented shock like the COVID-19 pandemic, sentiment analysis using big data from social media is particularly an excellent source of information and can provide investors with insights that determine investment strategies.Since the COVID-19 pandemic is ongoing and it is still not possible to predict extent of its impact across the world, this study has potential limitations.It may not be enough to draw a comprehensive conclusion about the impact of COVD-19 sentiment on US stock market in this paper, but it is time to diagnose the current situation and search for a solution to various changes in stock markets caused by the COVID-19 pandemic.While the COVID-19 pandemic is ongoing, this study is at the forefront of research on this issue.Using updated data for COVID-19 pandemic and stock markets, future research can be enriched by analyzing what the COVID-19 pandemic has brought in global stock markets and by developing new investment strategies for asset managers in the time of COVID-19.

Figure 1 .
Figure 1.Relative frequencies of Google searches for five terms related to coronavirus over the period from 21st of January 2020 to 20th of May 2020.(a) "Coronavirus" Google Trends, (b) "Laid off" Google Trends, (c) "Unemployment" Google Trends, (d) "Recession" Google Trends, (e) "Vaccine" Google Trends

Figure 2 .
Figure 2. Changes in the relationship level between Daily News Sentiment Index (DNSI) and 11 select sector indices by time lag difference.

Table 1 .
Sample distribution of 11 select sector indices and Daily News Sentiment Index.

Table 2 .
Test Results for Daily News Sentiment Index.
Note: All test results are significant at the 1% significant level.

Table 3 .
Results from time series regression model (3).

Table 4 .
Industry classification based on the empirical results.