Relations among Bitcoin Futures, Bitcoin Spot, Investor Attention, and Sentiment

: This research investigates the function of price discovery between the Bitcoin futures and the spot markets while also analyzing the impact of investor sentiment and attention on these markets. This study utilizes various statistical models to examine the short-term and long-term relations between these variables, including the bivariate Granger causality model, the ARDL and NARDL models, and the Johansen cointegration procedure with a vector error correction mechanism. The results suggest that there is no statistical evidence of price discovery between the Bitcoin spot price and futures, and the term structure of the Bitcoin futures neither enriches nor impairs this lead lag relation. However, the study ﬁnds robust evidence of a long-run cointegrating relation between the two markets and the presence of asymmetry in them. Moreover, this research indicates that investor sentiment exhibits a lead lag relation with both the Bitcoin futures and the spot markets, while investor attention only leads to the Bitcoin spot market, without showing any lead lag relation with the Bitcoin futures. These ﬁndings highlight the crucial role of investor behavior in affecting both Bitcoin futures and spot prices.


Introduction
A recent survey conducted on the most widely held financial assets reported that cryptocurrency is the second most widely held financial asset 1 , especially among women.Among the various cryptocurrencies, Bitcoin has been recognized as the best-performing asset in recent times 2 .Introduced in 2008, Bitcoin is a fascinating addition to the financial markets.Among the 9420 cryptocurrencies traded around the world, Bitcoin is generally considered the leading cryptocurrency in terms of market share 3 .In addition to being a dominant asset, Bitcoin has received much attention, not only from regulators and the media but also from academic researchers and investment participants in the financial markets.In fact, December 2017 was the first time that the Bitcoin futures were introduced, owing to the popularity that Bitcoin has enjoyed since its launch.The introduction of Bitcoin futures has facilitated hedging risks related to the underlying Bitcoin spot or other traded cryptocurrencies.It is important to note that although the Bitcoin market share has decreased over time, from more than 90% in 2010 to about 47.13% in 2023, it remains one of the most important digital currencies in the cryptocurrency market.
Several factors substantiate the uniqueness of Bitcoin among all the listed cryptocurrencies 4 and the Bitcoin futures 5 .As the investigation into the regulation of cryptocurrencies has gained significance, so has the research into cryptocurrencies, specifically the Bitcoin spot price and Bitcoin futures.Most of the existing research has focused on the determinants of price discovery in the Bitcoin markets (Alexander et al. 2020;Alexander and Heck This additionally adds validity to the robustness of our significant long-term relations between the series.
The subsequent sections of this research are organized as follows.Section 2 presents an extensive examination of the pertinent literature.Section 3 discusses the methods or research design.Section 4 of this study describes the data.Section 5 presents the empirical evidence as results, and Section 6 finally concludes the paper.

Literature Review
The Bitcoin futures were introduced by both the Chicago Board Options Exchange (CBOE) and a week later by the Chicago Mercantile Exchange (CME) in December 2017.These Bitcoin futures were primarily created to present two major functionalities, i.e., price discovery and hedging in the underlying spot Bitcoin market, and past research has highlighted the efficacy of the Bitcoin futures as a hedging tool (Sebastiao and Godinho 2020;Alexander et al. 2020;Nekhili 2020;Kochling et al. 2019).As much as some empirical studies have shown that the Bitcoin futures are an effective instrument for hedging, there is also evidence of the limitation of the hedging properties of these futures, warranting further investigation into their dynamics (Hung et al. 2021;Alexander and Heck 2020;Hattori and Ishida 2020).Likewise, one of the most important functionalities provided by the futures contracts towards the underlying spot asset activity is price discovery (Silber 1981).
In general, the prices of these futures contracts mirror the expectations of investors in the corresponding asset market for the near future.This anticipation should be factored into the values of the underlying spot assets, contributing to a process of price discovery.The evidence of price discovery from past research is mixed, confirming the need to investigate the relations further.Although some shared factors influence both the Bitcoin spot prices and futures, there is evidence suggesting that the futures have a relatively greater impact on the price discovery process (Akyildirim et al. 2020;Kapar and Olmo 2019).Furthermore, there is alternative evidence, using Hasbrouck's (1995) information share methodology, indicating that the Bitcoin spot price leads and exerts an influence on the future prices with regard to price discovery (Baur and Dimpfl 2019;Corbet et al. 2018).These inconsistent findings could be a result of the relatively small data sample that is used in these studies, as well as a sharp decline in the market conditions for Bitcoin 6 , further highlighting the need to analyze these relations with larger sample periods.There is also evidence that as the Bitcoin futures contracts become shorter and shorter, the accuracy of the futures contract in aiding the price discovery of the underlying spot market increases (Matsui and Gudgeon 2020).This highlights the need not just to reinvestigate the lead lag relation between the Bitcoin futures and spot but also to use the near-term and next-term futures contract to identify the magnitude of this linkage.The preceding discussion suggests that, in the literature, there is no prevailing consensus regarding the price discovery process in the Bitcoin market.
In addition to the extant literature looking at the short-run dynamics between the Bitcoin futures and spot, several studies look at the long-run relations between these assets (Lee and Rhee 2022;Wu et al. 2021;Hung et al. 2021;Hu et al. 2020;Kapar and Olmo 2019;Cheah et al. 2018).Some past research evidence shows a long-run relation between the Bitcoin futures and spot and suggests that the futures dominate the spot assets using a fractionally cointegrated framework (Wu et al. 2021;Cheah et al. 2018).Although the fractional cointegration framework allows the underlying series to take fractional integration values and analyzes the long-run equilibrium relations, it has no allowance for deterministic trends, and the convergence to equilibrium does not necessarily have an optimal rate.Additionally, the memory parameter is unknown in the fractionally cointegrated system.These limitations support the estimation of the long-run dynamics using a standard cointegrated framework.Some studies have looked at these long-run relations assuming a time-varying cointegrating coefficient (Lee and Rhee 2022;Hu et al. 2020), but the challenge with these findings is the oversight of the common factors that remain stable and that affect both the Bitcoin futures and spot simultaneously over time.
Notably, the challenge with these studies substantiates the need to revisit the analysis of long-run relations using a standard cointegration framework, that allows for asymmetry in addition to relaxing the assumptions of the order of integration of the underlying series.
Intrinsically, research has been conducted to analyze the aspects that provide Bitcoin with its value or the factors behind its price fluctuations.It has been distinctly observed that the Bitcoin price reflects several more factors than the standard interactions of demand, supply, and fundamental news (Griffin and Shams 2020;Eross et al. 2019;Panagiotidis et al. 2019).To further understand the impact of non-standard factors on the pricing of the Bitcoin futures and spot, some studies have looked at the role of social interactions in Bitcoin prices (Lyocsa et al. 2020;Gronwald 2019;Geuder et al. 2019;Garcia et al. 2014).While these studies use different socio-economic signals, like the volume of information searched about Bitcoin, the prices in online exchanges, the volume of word-of-mouth communication in online media, and user base growth, they show evidence that highlights the impact of online searches, word of mouth, and an expanding user base on Bitcoin pricing and not the pricing of the Bitcoin futures.The reason for these findings is that as the media publishes articles about price increases in Bitcoin, it fosters and acts as a stimulus for search activities among investors, thereby impacting trading activity and the pricing of the underlying asset.These findings are equally robust for negative news or unfavorable attention as well and can result in significant price declines (Chevapatrakul and Mascia 2019).In essence, while a Google search of a traditional currency will not impact its value or volume, it can, in all likelihood, drive the prices of Bitcoin (Aalborg et al. 2019).Owing to the extensive spectrum of available socio-economic signals and the limited few that have been studied in past research, it is necessary to further analyze the impact of these signals on both the Bitcoin futures and the spot assets using alternate measures.
Further, in looking at the factors that give Bitcoin its value, a small number of studies have looked at the impact of sentiment or public opinion on the price movements of Bitcoin (Entrop et al. 2020;Rognone et al. 2020;Dastgir et al. 2019;Aalborg et al. 2019;Karalevicius et al. 2018).While some results are in support of the fact that sentiment does affect Bitcoin pricing (Rognone et al. 2020;Karalevicius et al. 2018), there is also evidence that sentiment does not play any role in impacting the prices of Bitcoin (Dastgir et al. 2019;Entrop et al. 2020), and, most importantly, none of these past studies look at the relation of sentiment with the Bitcoin futures.This paper addresses this limitation by looking at the impact on both the future as well as the spot.An indication of bi-directional causality between the Bitcoin attention variable measured by Google Trends search queries and the Bitcoin asset returns softens any explanatory linkage (Dastgir et al. 2019;Fry 2018) and limits itself to looking at only the spot asset.Although these results are conflicting and limited, the impact that social media plays in the acceptance of Bitcoin as an asset and the ensuing trading activity are broadly analyzed, with results indicating that bullish posts predict positive returns and bearish posts predict negative returns (Chen et al. 2020;Mai et al. 2018).The aforementioned discussion of sentiment and its impact on Bitcoin pricing implies that there is no conformity in the literature on the linkage between sentiment and Bitcoin pricing, supporting the need to investigate this relation further.
In summary, after examining numerous relevant literature works pertaining to the Bitcoin futures, Bitcoin spot, investor attention, and sentiment, along with identifying their incongruities, this study attempts to answer several questions that include the following: Do the Bitcoin futures and the Bitcoin spot have any long-run relation?Conditional on the existence of a long-run relation, is there any asymmetry in this long-run relation?Do the Bitcoin futures and the Bitcoin spot have any short-run causal relation?Do the Bitcoin futures and Bitcoin attention have any short-run causal relation?Do the Bitcoin futures and Bitcoin sentiment have any short-run causal relation?Conditional on the existence of a short-run causal relation between the Bitcoin futures and spot, do the shorter-term Bitcoin futures (near-term versus next-term) have greater statistical significance in the short-run causal relation with the Bitcoin spot?

Methodology
We are interested in evaluating whether the movements in the Bitcoin futures, the Bitcoin spot prices, crypto sentiment, and investor attention have a short-term causal or long-term cointegrated relation.The interest in the relations among these variables arises from the limitations in the extant literature identified in the previous sections.We accomplish this analysis by following four distinct steps.First, we investigate the stationarity of each series used in the study employing multiple tests (Dickey andFuller 1979, 1981;Cheng et al. 2021;Kwiatkowski et al. 1992 (KPSS); Ng and Perron 2001).After checking for stationarity, second, we evaluate the short-term causal relation using bivariate Granger causality tests (Granger 1969(Granger , 1980(Granger , 1988)).Third, after determining the optimal lag lengths, we evaluate the long-term cointegrated relation using several methods.We use the Johansen cointegration method (Johansen 1988(Johansen , 1991(Johansen , 1995) ) as well as the autoregressive distributed lag (ARDL) method (Pesaran and Shin 1999;Pesaran et al. 1996Pesaran et al. , 2001) ) and the nonlinear autoregressive distributed lag (NARDL) method (Demir et al. 2021;Mhadhbi et al. 2021;Dutta et al. 2019;Shin et al. 2014).Fourth, and finally, we evaluate the error correction model.We provide detailed explanations of each method used in this study below.

Stationarity
Stationarity is considered as invariance under a time shift, and any time series that is stationary is treated to have properties that are not conditional on the time in which they are observed.In general, a stationary time series will have no discernible pattern in the long term.Although the features of a constant mean and variance are not particularly imperative in estimating the parameters in econometric models, they can significantly impact model selection, since these features are essential for the calculation of reliable test statistics.Hence, it becomes essential to first test whether the relevant variables have the problem of a unit root and determine the orders of integration for each of the series used in the study (Enders 2014;Chan 2010) before we can determine the statistical specification of the model and conduct either a causality or cointegration test.We first conduct the stationarity tests using the standard augmented Dickey-Fuller (ADF) method (Cheng et al. 2021).The augmented Dickey-Fuller (ADF) test is a standard test for stationarity and is estimated using the following general equation.
where δ = ∑ m i−1 ρ i−1 and θ i = −∑ m k=i+1 ρ k .The dependent variable Y t is lagged to represent higher-order autoregressive processes and to eliminate serial correlation.In Equation (1), a time trend is also included.This is done to test the presence of a deterministic trend.It is important to note that we consider and include a variable for a subsequent Johansen cointegration analysis if and only if it meets the criterion of being non-stationary and integrated of at least order I(1).Correspondingly, we consider and include a variable for the ARDL cointegration test only if it meets the criterion of not being integrated of higher order, like I(2) or greater.The null hypothesis of the ADF test (a unit root exists, δ = 0) is rejected when the test statistic that is computed is greater than the critical value.Failing to reject it will imply that the series is non-stationary, and we must differentiate it until it becomes stationary.To check the robustness of our findings and to verify the consistency of the stationary properties, we also conduct the KPSS test (Kwiatkowski et al. 1992).Contrary to the ADF test, the KPSS test evaluates for a null hypothesis of no unit root or stationarity and an alternative hypothesis of the existence of a unit root or non-stationarity.The KPSS test can be depicted by Equation (2) below.
where η denotes the respective KPSS statistic for the testing of stationarity around the mean and σ2 N is a consistent estimator of the long-term variance of residuals.Past research has shown evidence that both the ADF and the KPSS tests experience some size problems as well as finite sample power problems (Sephton 2008).In order to address this issue, we also run stationarity tests using the Ng and Perron test, which uses the detrending of the generalized least squares method (Ng and Perron 2001;Phillips and Perron 1988).The Ng and Perron test statistics are depicted in Equation (3) below.
where y d t represents the GLS detrended data and σ2 N represents a consistent estimator for the persistent variance of residuals over the long term.The null hypothesis of the Ng and Perron test is that the unit root exists, and the alternate hypothesis is that the series is stationary.If the computed statistics for MZ a and MZ t are less than the critical values in absolute terms, we reject the null hypothesis.The Ng and Perron test is also used to complement the results of both the ADF and KPSS tests.

Granger Causality
After analyzing the stationarity properties of the series, we look at the short-term causal lead lag relations using the bivariate Granger causality tests (Granger 1969(Granger , 1980(Granger , 2001)).In essence, the method analyzes whether a particular time series is a factor of another series by reducing the forecast error of the overall model."Causality" here does not necessarily mean a cause-and-effect relation between the variables but rather the "precedence" of one variable over the other in time series data.In this way, the use of time series information facilitates an understanding of the direction of causality.The bivariate linear Granger causality used in this study can be shown by a generic two-equation model as below.
where all X t and Y t are stationary variables, the optimal lag length in the system is shown by p, and U t and V t are the random errors.To test whether X t Granger causes Y t , we need to determine whether any lags of X t are statistically significant in Equation ( 4).We do this using an F-test for linear restrictions.Empirically, in testing the null hypotheses shown below, we are testing for the presence of a linear causal relation between Y t and X t .
By testing the hypotheses stated in Equation ( 6), we have several possibilities for causal relations between Y t and X t , which include either a lack of causal relation between the variables or a unidirectional or bidirectional relation between the variables.
In addition, it is important to emphasize that the results of the bivariate Granger causality tests can be affected by the choice of the lag lengths in Equations ( 4) and ( 5) (Thornton and Batten 1985;Guilkey and Salemi 1982).Specifically, if we use more lags than the true order, the power of the test will be affected.In addition, if we use fewer lags than the true order, the estimates from the regression will be biased.Moreover, the residuals from the regression will be serially correlated.Therefore, we adopt Hsiao's approach (Hsiao 1982(Hsiao , 1981(Hsiao , 1979) ) to select lag lengths that minimize the Akaike Information Criterion (AIC) and the prediction error (Akaike 1969a(Akaike , 1969b(Akaike , 1974(Akaike , 1981)).We use the unrestricted vector autoregression (VAR) procedure to examine each series for its optimal lag length, using the p th order VAR model as expressed by Enders (Enders 2014;Chan 2010).

Johansen Cointegration
The cointegration test is generally used to analyze whether there is a long-term relation between two or more time series.If a cointegrating relation is absent, the two variables can move arbitrarily through time and away from each other.In this paper, we analyze the existence of cointegration among variables specified using three methods.The first method is explained in this sub-section, and the other two methods are explained in the following sub-section.The Johansen cointegration procedure (Johansen 1995;Engle and Granger 1987) is a general dynamic systems technique that allows for more than one cointegrating relation.The variables are parametrized in terms of the lagged levels of the system variables in addition to the lagged first differences under this approach.Consider a VAR model of order m as shown below in Equation ( 7).
where Y i is an n by 1 vector of variables that are first-order integrated and are denoted as I(1), µ is the first moment of the series, A 1 . . .A m are the coefficient matrices for each lag, and ε i is the noise term with a mean of zero.If the vectors are cointegrated, we can form a vector error correction model (VECM).Then, Equation ( 7) can be modified, as shown below in Equation ( 8). where is the differencing operator, Γ is the coefficient matrix for the first lag, and Γ j are the matrices for each differenced lag.
The Johansen test successively assesses whether the rank (r) is equal to zero or one, continuing up to (r) being equal to n − 1, where n is the number of time series variables used to conduct the test.The null hypothesis is that there is no cointegrating relation.When the rank is greater than zero, it indicates the existence of some cointegrating relation between the series examined.Eventually, r is the number of cointegrating relations.Thus, A is the parameter in the vector error correction model (VECM) that acts as an adjustment parameter, which must have a negative sign and statistical significance, and β is the cointegration vector of each column of the Johansen model.
Specifically, for the Johansen procedure, this cointegration test is conducted in two main forms: trace tests ( λ trace ) and maximum eigenvalue tests ( λ max ).These are the primary tests used in canonical corrections to help to determine the number of cointegrating vectors (CIVs) among the series of interest.
It is also important to note that, similar to the way in which the Granger causality tests were affected by the choice of the optimal lag length, the Johansen cointegration test is also affected by the lag length choice.Additionally, while the λ trace value tests the null hypothesis that the number of cointegrating vectors is ≤r against an alternative hypothesis, the λ max value tests the null hypothesis that the number of cointegrating vectors is = r against an alternate hypothesis of r + 1.The statistics of this procedure are defined by Equations ( 9) and (10) shown below.

ARDL and NARDL Cointegration
We have previously examined a cointegration approach (Johansen 1995(Johansen , 1991;;Johansen and Juselius 1990;Engle and Granger 1987) to assess the long-term connection between the variables of interest.One of the major limitations of this approach is that the series has to be integrated into at least order one, I(1).To overcome this problem, this study employs the widely recognized ARDL model (Pesaran et al. 2001).The ARDL model offers the flexibility that the set of variables can be integrated in a different order as it is the most general dynamic unrestricted model in the economic literature (Sari et al. 2008;Ghatak and Siddiki 2001).Additionally, a dynamic error correction model can be derived from ARDL by using a linear transformation, which will essentially enhance the speed of adjustment towards equilibrium (Banerjee et al. 1993).A standard linear ARDL(p, q) cointegration model with two time series Y t and X t has the form shown in Equation ( 11).
where Z t is a vector of deterministic regressors and E t is an iid stochastic process.According to the null hypothesis, the two series are not cointegrated, implying that the coefficients of the lagged levels of the two variables in Equation ( 10) are jointly zero (ρ = θ = 0).The hypothesis of this model can be tested using a modified F-test or a t-test, as shown in the prior literature (Pesaran et al. 2001).It is important to note that the combination of stochastic regressors in the ARDL model is linear and signifies symmetric adjustments in both the long and short run.We can further extend this methodology to include nonlinearities (Demir et al. 2021;Mhadhbi et al. 2021;Dutta et al. 2019;Shin et al. 2014).The nonlinear ARDL(p,q) model is depicted below in Equation ( 12).
This NARDL model is capable of explaining the asymmetry in the long-run relation, and the model's hypothesis can be evaluated through the use of the bounds testing procedure similar to the ARDL model (Demir et al. 2021;Pesaran et al. 1996Pesaran et al. , 2001;;Mhadhbi et al. 2021;Shin et al. 2014).

Error Correction Model
Based on the outcomes of the Johansen, ARDL, and NARDL cointegration tests, the establishment of a cointegrating relation between variables Yt and Xt implies the existence of a long-term equilibrium relation between them.However, to evaluate the short-run relations and properties of the cointegrated series, we use the error correction model (ECM) technique.In brief, the ECM, consistent with the long-run cointegrating relation, represents how Y t and X t behave in the short term.This ECM is shown to contain important information on both the long-term and short-term properties of the model with disequilibrium.For example, if two variables are integrated in order one, I(1), and there is a linear combination between them that is integrated with order zero, I(0), then we will have an error correction term (ECT) that is statistically significant and has a negative sign, indicating the speed of adjustment towards equilibrium.The general equation for the error correction model can be represented as shown below in Equations ( 13) and ( 14).
where, most importantly, M, N, O, P, Q, and R are the coefficients of the above models; ϕ is the coefficient of adjustment towards equilibrium in the long term; U and V are random error terms; and ECT denotes the deviations from the long-term equilibrium between the two lagged series.The ECM, in a way, captures an element of the speed of adjustment at which a dependent variable returns to equilibrium.Moreover, in comparing the ECT of the Johansen procedure versus the ARDL versus the NARDL approach, the ECT of the NARDL model should have the most enhanced speed of adjustment towards equilibrium among the three models considered.This should be followed by the ARDL model and the Johansen model (Demir et al. 2021;Mhadhbi et al. 2021;Nkoro and Uko 2016;Shin et al. 2014;Banerjee et al. 1993).

Data
To evaluate and analyze the relations between the variables used in this study, three separate types of data with daily frequency were gathered: (1) Bitcoin futures and spot data, (2) investor attention data, and (3) investor sentiment data.The Bitcoin futures and spot data were obtained from Bloomberg.The important attention data were gathered from Google Trends (Google Search Volume Index-GSVI), which allows us to compare the relative popularity of search terms for specific time periods and regions.The unique Bitcoin sentiment data, which were based on a multifactorial crypto market sentiment analysis, were collected from the alternative.mewebsite.The time period for which the data were gathered for the different variables used in this study spanned from 1 February 2018 to 8 September 2022.Figure 1 shows a graph of the Bitcoin spot, the two rolling Bitcoin futures, and the sentiment and attention variables through time, and Table 1 describes all the variables used in this study.
To evaluate and analyze the relations between the variables used in this study, three separate types of data with daily frequency were gathered: (1) Bitcoin futures and spot data, (2) investor attention data, and (3) investor sentiment data.The Bitcoin futures and spot data were obtained from Bloomberg.The important attention data were gathered from Google Trends (Google Search Volume Index-GSVI), which allows us to compare the relative popularity of search terms for specific time periods and regions.The unique Bitcoin sentiment data, which were based on a multifactorial crypto market sentiment analysis, were collected from the alternative.mewebsite.The time period for which the data were gathered for the different variables used in this study spanned from 1 February 2018 to 8 September 2022.Figure 1 shows a graph of the Bitcoin spot, the two rolling Bitcoin futures, and the sentiment and attention variables through time, and Table 1 describes all the variables used in this study. .

Bitcoin Futures and Spot
Bitcoin is a decentralized cryptocurrency and technically the crypto industry's first asset, was launched in 2009, following the white paper by Satoshi Nakamoto in 2008.There is a maximum supply of 21 million bitcoins, and their trading has gained significance over time, causing them to be the largest and most popular cryptocurrency by market capitalization (Howell et al. 2020;Hashemi Joo et al. 2020).The active trading of Bitcoin and the fervent interest of investors has given rise to both regulated and unregulated Bitcoin derivative markets.Bitcoin futures contracts were first offered and traded on 10 December 2017 by the Chicago Board Options Exchange (CBOE).Similarly, they were offered and first traded on 17 December 2017 by the Chicago Mercantile Exchange (CME).This study only includes the listings from the CME, gathered from Bloomberg, since the volumes traded on the CME are significantly larger than those for the CBOE and the listings are regulated by the Commodity Futures Trading Commission (CFTC) 7 .All CME Bitcoin futures contracts generally expire on the last Friday of the month and are cash-settled rather than taking the delivery of actual Bitcoin 8 .
We organize two Bitcoin futures price series.This is accomplished by grouping the two adjacent futures contracts based on time to maturity.The groups "nearest-term" and "next-term" represent the futures contracts that have the shortest and the second shortest time to maturity at a given date for the Bitcoin spot.The two futures price series are gathered, each with rolling contracts 9 , and are recorded in terms of another currency 10 .It is important to note that this price varies across exchanges, mainly due to different fee policies and cash-out methods.The implication of this is that the source that we gather our data from matters.Since exchange-based data providers present their own quotes, whereas extrinsic data providers like Bloomberg or Coinbase compute their own indexes, usually a weighted average of all prices across major exchanges, we use the data from Bloomberg, consistent with past research (Baur and Dimpfl 2021;Janson and Karoubi 2021;Hattori andIshida 2020, 2021;Cermak 2017).

Investor Attention
In examining several investor attention measures, such as turnover, extreme returns, news, etc., past research has shown evidence that the Google Search Volume Index (GSVI) acts as a direct and unambiguous measure of retail investor attention; it leads all other investor attention measures and an increase in GSVI results in an increase in stock liquidity due to the positive price pressure on the underlying asset from retail investors (Ding and Hou 2015;Da et al. 2011;Mondria et al. 2010).The GSVI compares the relative popularity of search words relative to the entire volume of searches on Google (Woloszko 2020;Stephens-Davidowitz and Varian 2015), and an increasing GSVI measure does not imply that there are more searches currently than in the past.It only means that a larger share of the searches on Google are dedicated to the specific search word; hence, its use as a proxy for information demand should be performed with care.Nevertheless, since the number of users using a Google search is a random sample of the total internet users, this measure lends itself to valid interpretation (Aslanidis et al. 2022;Tong et al. 2022).It is also important to note that while searching for words, the GSVI includes all searches that contain the word of interest, and it does not differentiate when additional inconsequential words are included in the word of interest.In order to increase the relevance of the attention measure, we augment the main search word, "Bitcoin", with individual additional search words such as "Futures", "Price", "Spot", and "Trading", along with a plus operator, and we take the average of all the resulting measures for a particular date.These additional words and the plus operator aid in mitigating the problem of a resultant generic search, as well as the tacit challenge of smaller search volumes (Heyman et al. 2019;Yung and Nafar 2017;Han et al. 2017).

Investor Sentiment
The Fear and Greed Index (FGI) for Bitcoin, which is used as a measure of investor sentiment in this study, is gathered from a unique and comprehensive data source 11 that, in essence, tries to capture the emotional state of the cryptocurrency market.The FGI is constructed based on the expectation that investors are likely to become greedy (fearful) when the market price for assets increases (decreases), inducing an overreaction.Extreme fear can result when Bitcoin prices are far below their intrinsic value, and too much greed can result when prices are far above what they should be worth (Gunay et al. 2022;Mokni et al. 2022).It is important to note that the FGI takes values that range between 0 and 100.The range of values 0 (100) indicates the behavior or emotional state of investors, i.e., "Extreme Fear (Extreme Greed)".This multifactor FGI is constructed based on six important and distinctive factors.
The FGI uses volatility (25%) as the first factor because an unanticipated volatility increase is typically construed as an indicator of fear in investors.The ratio of market momentum to volume (25%) is used as the second factor because high buying volumes in a positive market are typically considered as greedy actions by investors.Social media (15%) is used as the third factor because an unusually high interaction rate on social media in terms of the count of posts and various hashtags is typically interpreted as an increase in public interest or corresponding greedy behavior by investors.Although the fourth factor of surveys (15%) or people polls is currently paused, they were included as a measure of direct perception by investors in the Bitcoin market for the time period for which we gather data.Dominance (10%) is used as the fifth factor in constructing the FGI because the market capitalization share of Bitcoin with reference to all the available cryptocurrencies, in essence, measures the substitution of investment capital and fear among investors.Bitcoin dominance is generally understood to be caused by the reduction of investments in speculative alt-coin.Over time, Bitcoin has become the safe haven of cryptocurrency.Finally, the search trends (10%) are used as the sixth and last factor in constructing the FGI.In essence, the FGI uses both endogenous and exogenous factors in its construction (Gunay et al. 2022;Mokni et al. 2022;Guler 2021).To the best of our knowledge, this paper is the first to use the FGI to evaluate the linkages between investor sentiment and the Bitcoin futures and spot.

Descriptive Statistics, Returns, and Correlations
Table 2 exhibits the descriptive statistics for all variables across the entire study period.The average Bitcoin spot price (20,385.18) is quite similar to the average of both the nearterm futures (20,413.64)and next-term futures (20,493.34).Both the Bitcoin futures (F1 and F2) are marginally higher than the spot (F0) in average price levels.Although the median values have a similar relation in comparing the futures and the spot, essentially, they are only 50% of the average values.The average Bitcoin sentiment (43.31) and its respective median (40) indicate that the sentiment on average is classified as "Fear" 12 .Similarly, the mean Bitcoin attention (50.90) is very close to the median (48).All the variables used in this study show positive skewness, indicating longer right tails, and the Jarque-Bera test results show evidence that all the variables are not normally distributed.The kurtosis numbers for all the variables were greater than +2, indicating a peaked distribution.This table shows the descriptive statistics of the entire period's data in levels.The summary statistics reported include mean, median, max, min, standard deviation, skewness, kurtosis, Jarque-Bera, probability, sum, sum square deviation, and number of observations.The variables are defined in Table 1.
Table 3 illustrates the descriptive statistics for returns or changes in the different series.Panel A gives the summary statistics, and Panel B gives the counts of each series within the specified percentage bins.The average daily return of the Bitcoin spot (0.17%) in Panel A is almost 29% lower than the average daily Bitcoin futures return (0.22%).Moreover, the longer-term rolling futures have a lower median value than the shorter-term rolling futures.It should be noted that the Bitcoin futures (F1 and F2) offer higher returns on average compared to the Bitcoin spot (F0), but their standard deviations are higher as well.Regarding the count distribution of the returns shown in Panel B, most of the returns for the Bitcoin spot and futures fall within the range of −5% to +5% (approximately 75% of the returns data) 13 .In looking at the direction of the returns and the count distribution of the returns, it is clear that both the Bitcoin futures and the spot prices move in the same direction, indicating the least benefit for hedging.The sentiment and attention variables indicate that investors pay more attention to the Bitcoin asset when overall sentiment can be classified as "Fear" or "Extreme Fear".Table 4 presents the Spearman rank-order correlation statistics.As is evident, the correlations between the near-term and next-term Bitcoin futures and spot are almost close to +1, reinforcing the inference that we made when looking at the count distribution of returns, namely that these securities offer the least benefit for diversification.Interestingly, while the sentiment variable is also positively correlated with all the Bitcoin variables (F0, F1, and F2), the attention variable is negatively correlated with the Bitcoin variables.The correlations of the attention variable are marginally lower than zero, indicating that if the attention to Bitcoin among investors increases, a likely reason is fear sentiments, which would plausibly exert negative price pressure on both the Bitcoin spot and futures.

Empirical Results
To facilitate the investigation of both the short-term and long-term relations between the Bitcoin futures, spot, sentiment, and attention variables, we need to empirically evaluate the stationarity of each series and identify the order of integration.Since both the bivariate Granger causality tests used to analyze the short-term relations and the Johansen test used to evaluate the long-term relations are sensitive to the choice of lag length, we need to examine the lag structure resulting from an unrestricted VAR model for each time series and determine the optimal lag lengths.The long-term relations can be evaluated using the ARDL and NARDL tests, which eliminate the limitation on the order of integration imposed by the Johansen procedure.The NARDL model specifically allows for an asymmetric longrun relation between the variables.The following sections analyze the results of each of these tests, and, in the end, we evaluate an error correction model for the Johansen, ARDL, and NARDL procedures.

Stationarity and Optimal Lag Length
Examining the stationarity properties of all the variables is critical in performing a cointegration analysis.The ADF (Dickey andFuller 1979, 1981), KPSS (Kwiatkowski et al. 1992), and Ng and Perron (Ng and Perron 2001) tests are used to evaluate the unit root.Panels A and B in Table 5 present the results of the ADF and the KPSS and NG and Perron test results, respectively.Regardless of whether we include only an intercept or both an intercept and a trend when ADF tests are performed on the level data of the various series, we do not reject the null hypothesis of non-stationarity for three out of the five variables under examination.The Bitcoin futures (F1 and F2) and the Bitcoin spot (F0) variables show evidence of a unit root and must be first differenced.The null hypothesis is rejected for the sentiment and attention variables, and both these variables are stationary in level form.Considering the three series (F0, F1, and F2), which have unit roots in the level form, the Panel A results of the ADF test on the first differences show evidence to reject the null hypothesis of non-stationarity.Thus, these three series become stationary after the first differencing and are integrated into order one, I(1).In contrast to the ADF test, which posits the null hypothesis of non-stationarity, the KPSS test sets the null hypothesis as a stationary series and the alternative hypothesis as non-stationarity.Panel B in Table 5 presents results obtained for level data, which show evidence to reject the null hypothesis for three of the variables (F0, F1, and F2) and fail to reject the null hypothesis for the sentiment and attention variables.We fail to reject the null hypothesis for the F0, F1, and F2 series when the KPSS tests are applied to the first differenced data.The results of the KPSS tests are consistent with the ADF tests.Since these stationarity tests (ADF and KPSS) are known to suffer potentially from severe finite sample power and size problems (De Jong et al. 2007;Keblowski and Welfe 2004;DeJong et al. 1992), we further augment the stationarity results by conducting the Ng and Perron test.This test provides good power and reliable size properties to reconfirm the results of the ADF and KPSS tests.In Panel B of Table 5, we present the Ng and Perron test results.Based on the results, we reject the null hypothesis of stationarity on the level for variables F0, F1, and F2.After the first differencing, all three non-stationary series become stationary, and these results are consistent with the previous ADF and KPSS tests.
In conjunction with the stationarity tests, it is critical to examine the lag structure of the unrestricted VAR model for each series in determining the optimal lag length.Table 6 shows the results of the lag length analysis.Following past research, the lag length is chosen using the Akaike Information Criterion (AIC) (Hatemi-J and Hacker 2008;Akaike 1974Akaike , 1981)).The results in Table 6 show that the AIC criterion indicates that the model must have optimally eight lags for the Bitcoin futures (F1 and F2) and spot (F0) variables.Although the lag lengths for the sentiment and attention variables are depicted, they cannot be used in the cointegration analysis since the variables are already stationary and not integrated of at least order one, I(1).The eight lags that are determined will be used in the long-run relation analysis for the respective variables.This table shows the VAR-optimal lag length analysis.These results are used to conduct the cointegration analysis.The optimal lag length for each variable is determined to perform the cointegration analysis.Since sentiment and attention are not integrated of at least order one, they are not included in the cointegration analysis.
The table shows the optimal number of lags as per different criteria, i.e., LR, FPE, AIC, SC, and HQ.The optimal lag length is selected based on AIC.The variables are defined in Table 1.** indicates significance at the 5% level.

Granger Causality
We next analyze the short-term relations between the different variables using the bivariate Granger causality test (Granger 1969(Granger , 1980))."Causality", in this case, simply implies that the past values of one variable can be used to predict the future values of another variable and is tested using a standard F-test.The results of the Granger causality tests are shown in Table 7.The pairwise Granger causality tests between the Bitcoin spot (F0) and near-term (F1) and next-term (F2) Bitcoin futures show evidence of bidirectional causality.This indicates that the market between the futures and spot for Bitcoin is efficient, and there is no price discovery function between these assets.The informational content that affects the prices of both the futures and spot for Bitcoin is effectively reflected in the prices simultaneously.In contrast, all three variables (F0, F1, and F2) have unidirectional causality with the Bitcoin sentiment variable.The lagged values of the Bitcoin spot as well as the Bitcoin futures show evidence of predicting the future values of Bitcoin sentiment.There is a clear lead lag relation between the price movements of the Bitcoin spot and futures and the Bitcoin sentiment variable.The evidence supports the short-term dependence of the sentiment variable on the price movements of the spot and futures variables.
Evaluating the results of the retail investor Bitcoin attention variable, there is a unidirectional causal relation between the attention and Bitcoin spot variable and a bidirectional causal relation between the attention and Bitcoin futures variables.This evidence implies that retail investors who pay attention to the Bitcoin asset engage in transactions of the asset in the spot market and effectively impact the price of the Bitcoin spot.Interestingly, however, the information content in the same attention variable does not translate into activity in the Bitcoin futures asset, and the relation between them is efficient.Finally, the attention and sentiment variables have no causal relation with each other.Overall, these findings are significant, and this study is the first to collectively highlight these short-term relations.

Johansen, ARDL, and NARDL Cointegration
This study employs three cointegration techniques to analyze the long-term relation between the Bitcoin spot (F0) and the Bitcoin futures (F1 and F2).First, we employ the Johansen cointegration test developed by Johansen (Johansen 1988(Johansen , 1991)).Second, we employ the ARDL bounds cointegration test (Sari et al. 2008;Pesaran et al. 2001).Third, we employ the NARDL bounds cointegration test (Demir et al. 2021;Mhadhbi et al. 2021;Shin et al. 2014).As part of the Johansen test, we use both the trace test statistic and the maximum eigenvalue statistic to test the hypothesis of cointegration.After confirming previously that all three series (F0, F1, and F2) are integrated of the same order, I(1), and identifying the optimal lag length of eight lags, the multivariable Johansen test is conducted on these three variables and the results are shown in Table 8.The results in Table 8 indicate that between the Bitcoin spot (F0) and the two futures, near-term (F1) and next-term (F2), there is evidence of at least one cointegrating relation.The number of cointegrating relations is tested sequentially, starting at zero and incrementing to one and two.The null hypotheses of having zero and one cointegrating vectors are both rejected at the 5% significance level.We fail to reject the last sequential analysis with two cointegrating vectors after analyzing both the trace and the maximum eigenvalue test statistics.This implies that, at most, there is one long-run relation between the variables analyzed.
Following the results of the Johansen procedure, we relax the requirement of the series being integrated of at least order one, I(1), and conduct the ARDL test.The results of the ARDL test are shown in Table 9. Panel A presents the statistics for the bounds test, and Panel B shows the long-run coefficients.As part of the ARDL bounds cointegration test, we use the F-test and the t-test statistics to test the hypothesis of cointegration.Since the absolute values of both the F-test and the t-test are greater than the upper bound I(1) statistic values, we reject the null hypothesis and determine that these variables do have a long-run cointegrating relation among them.Following the results of the ARDL procedure, we relax the requirement of a symmetric relation between the Bitcoin futures and spot and conduct the NARDL test, which allows for asymmetry.The results of the NARDL test are shown in Panels C and D in Table 9. Panel C presents the statistics for the bounds test, and Panel D shows the long-run coefficients.As part of the NARDL bounds cointegration test, we use the F-test to evaluate the results.Since the absolute values of the F-test are greater than the upper bound I(1) statistic values, we reject the null hypothesis of no cointegrating relations and conclude that these variables do have a long-run cointegrating relation among them.Moreover, the responses to both lagged positive and negative changes are significant, with the response to lagged positive changes being stronger than the response to lagged negative changes.This is captured in the long-run coefficients in Panel D of Table 9.In summary, the findings of the Johansen, ARDL, and NARDL procedures are consistent, and there is evidence of at least one long-run relation between the Bitcoin futures and spot variables and the existence of asymmetry in this relation.

Error Correction Model
Consistent with the findings of our cointegration analysis and identifying the existence of at least one long-run cointegrating relation between the Bitcoin futures and spot variables and some asymmetry, the short-run dynamics need to be analyzed using the error correction model (ECM) instead of the VAR model.The general equation of the error correction model is shown in Equations ( 12) and ( 13) of Section 3. To establish a long-run equilibrium between the series analyzed, the coefficient, or the speed of adjustment term from the model, must satisfy two conditions.First, the coefficient should be negative; second, it should be statistically significant.
The results of the ECM are presented in Table 10.The dependent variable used in the ECM is the Bitcoin spot (F0), and the independent variables are the Bitcoin futures (F1 and F2 or F1_POS, F1_NEG, F2_POS, and F2_NEG) for the Johansen, ARDL, and NARDL procedures.Using the optimal lag length of eight and the ECM from the Johansen procedure, the coefficient in Table 10 satisfies both the conditions of being negative and statistically significant.This implies that the variables share a common trend, which describes the long-run relation between them.To augment these findings and by including an automatic selection of lag lengths and the ARDL or the NARDL procedure, the respective coefficients show evidence to satisfy both requirements, and the results are consistent with the error correction coefficient from the Johansen procedure.These results not only reinforce the existence of a long-run relation but also show evidence of asymmetry in this relation.In summary, the relation between the Bitcoin futures and spot undergoes an adjustment process through time towards long-run equilibrium, and the NARDL model has the fastest speed of adjustment towards long-run equilibrium.

Conclusions
The main objective of this research is to investigate the causal relations in the short term and the cointegrated relations in the long term among the Bitcoin spot price and futures, investor sentiment, and investor attention.Previous research on the cryptocurrency market has produced mixed results concerning the price discovery function between the Bitcoin assets.It has also been limited in scope in analyzing the long-term relation between the Bitcoin futures and spot and obscure in examining the impact of investor sentiment and attention on Bitcoin assets.This study aims to provide conclusive evidence of the price discovery function between the spot price and futures and investigate the effect of investor sentiment and attention on both the Bitcoin futures and spot assets.To achieve these objectives, the study employs the bivariate Granger causality methodology to investigate the short-term relations among the variables of interest.Furthermore, to investigate the long-term relations among the variables of interest, the study employs the ARDL, NARDL, and Johansen cointegration procedures with an error correction mechanism.
While the measure of investor attention (GSVI) and crypto sentiment (FGI) used in this study enables us to capture the interlinkages with Bitcoin assets, we wish to highlight the limitations in the scope and the existence of alternate measures (Ali et al. 2022).The purpose of this study is not to look at all the available measures of sentiment and investor attention; we leave this to future research.Moreover, from a methodology perspective, there are alternate methods like the cross-quantilogram, dynamic conditional correlation (DCC), wavelet multiscale decomposition, quantile-on-quantile methods, etc., which could be employed to look at the interlinkages, and we leave this also to future research.Importantly, we also do not answer two important questions as part of this study.This is related to identifying the nature of the participants in the Bitcoin market and the levels of attention at which they become active participants in the asset market, thereby influencing the prices.While there is some recent work on these questions (Ülkü et al. 2023), we highlight the need to examine this relation as part of future work.Overall, our focus is only on Bitcoin assets, and, in the future, we will extend this to include all other crypto assets.
In summary, this research contributes to the existing Bitcoin pricing literature by providing the first empirical examination of the interlinkages between the Bitcoin futures, Bitcoin spot, investor sentiment, and attention as a collective.The study has five main findings.First, there is no statistical evidence of price discovery between the Bitcoin futures and the spot market, and the term structure of the Bitcoin futures does not play a significant role in this relation.Second, there is statistical evidence of a lead lag relation between Bitcoin sentiment and both the Bitcoin futures and spot prices.This suggests that changes in market participants' opinions and perceptions lead to trading activity in both markets, affecting the prices of both assets.Third, investor attention statistically leads the Bitcoin spot market but does not exhibit any lead lag relation with the Bitcoin futures.Fourth, the study finds statistical evidence of a long-run cointegrating relation between the Bitcoin futures and spot prices, and the results are robust to the type of cointegration procedure used.Finally, the speed of adjustment towards equilibrium in the long-run cointegration relation is stronger and more significant when using the NARDL and ARDL procedures than the Johansen test, despite the robustness of all testing procedures.
The findings of this study have practical implications for retail and institutional investors, portfolio managers, regulators, and institutions such as the SEC, CBOE, and CME in understanding the interplay of these assets and market forces for optimal trading decisions and market regulation.In conclusion, this study contributes to the finance literature by establishing connections between the Bitcoin spot, futures, investor sentiment, and investor attention in the cryptocurrency markets.The consideration of investor sentiment and attention allows for a better understanding of the drivers of the spot and futures market and the identification of both long-term and short-term and lead lag relations, ultimately leading to a deeper comprehension of investor and asset behavior in the financial markets.Lastly, these findings can aid retail investors, institutional investors, portfolio managers, and regulators in making informed trading decisions, considering the price discovery between the Bitcoin futures and the spot market, as well as the impact of investor sentiment and attention on Bitcoin prices, thereby enhancing the efficiency of cryptocurrency markets.

Table 1 .
Description of the variables used in this paper.

Table 1 .
Description of the variables used in this paper.table provides the descriptions of the different variables used in this research paper. This

Table 2 .
Descriptive statistics in levels.

Table 4 .
Correlation statistics.This table shows the Spearman rank correlation coefficients among the bitcoin spot, near-term bitcoin future, next-term bitcoin future, bitcoin sentiment, and bitcoin investor attention.The variables are defined in Table1.

Table 5 .
Unit root test statistics.

Ng and Perron Tests on First Difference (With Intercept and Trend)
This table shows the unit root statistics for each variable used in the study using levels and first differences.PanelA presents the unit root test results using the augmented Dickey-Fuller tests.Panel B shows the unit root tests results using the KPSS and the Ng and Perron tests.The variables are defined in Table1.** indicates significance at the 5% level.

Table 7 .
Granger causality results.This table shows the bivariate Granger causality test results.The variables are defined in Table1.In the null hypothesis, when the sentence starts with D() of a particular variable name, it implies that this series had to be first differenced to make it stationary.** indicates significance at the 5% level.

Panel C: NARDL Cointegration Bounds Test Test Statistic Value Lower Bound-I(0) Upper Bound-I(1)
This table presents boththe ARDL and NARDL cointegration results.Panel A shows the bounds test results for the ARDL model and Panel B shows the long-run coefficients from the ARDL model.Panel C shows the bounds test results for the NARDL model and Panel D shows the long-run coefficients from the NARDL model.The variables are defined inTable 1. ** indicates significance at the 5% level.