Estimating the capital asset pricing model with many instruments: A Bayesian Shrinkage approach

We propose a Bayesian approach to estimate the capital asset price model (CAPM) in the presence of measurement errors using a large set of instruments and shrinkage priors over the parameters associated with the instrumental variables. When using the instrumental variable approach to estimate the CAPM, the challenge is to find “strong” instruments to the market portfolio, although the data-rich environment available in finance allows us to use many-instrument (possibly weak) settings. We use regularization priors to deal with a large number of instrumental variables and improve inference of the estimated CAPM beta. Using simulated data, we find that our approach may reduce the average bias caused by error-in-variables up to 90% concerning the traditional two-stage least square. This reduction, however, is attenuated as the number of instruments increases. In an empirical application, we find that the estimated CAPM stock’s beta is greater than the ordinary least square estimator, while the traditional two-stage least square converges to the admittedly biased least squares estimator.


Introduction
A well-established fact in financial econometrics is that the estimation of the capital asset price model (CAPM) requires a surrogate for the market return since the true market return is unobservable (Roll, 1977;Stambaugh, 1982;Prono, 2015;Simmet and Pohlmeier, 2020). The use of such a surrogate introduces an error-in-variables (EIV) that bias the estimates and turn the interpretation of a rejection of the model ambiguous (Roll, 1977). In this model, the traditional instrumental variable solution to the EIV problem is challenging. Although there are many instruments available for market return, these instruments are weakly correlated with market return, which makes it difficult to elicitate instruments. This paper proposes a Bayesian approach to estimate the capital asset price model using a large set of instruments and shrinkage priors over the parameters associated with the instruments.
Ignoring the measurement error present in the CAPM has drastic consequences for evaluating the model empirically. The biased estimates may mismeasure the exposition to the systematic risk of an asset, and it also has consequences to test the validity of the model. If a test indicates the rejection of CAPM, then there are alternative interpretations: Either the model is false, or the surrogate of the market return is unappropriated. This point was extensively discussed in Roll (1977) and is commonly known as Roll's (1977) critique.
The literature tries to solve the Roll's (1977) critique in several ways (Stambaugh, 1982;Shanken, 1987b,a;Jagannathan and Wang, 1996;Jegadeesh et al., 2019), and a particular one is to use instrumental variable estimation (Coën andÉric Racicot, 2007;Racicot et al., 2019). This approach requires finding instruments, which are variables that are correlated with the market return but uncorrelated with the error term of the single-factor CAPM equation. On the one hand, the martingale property of market return and the difficulty in predicting it limit the IV approach (Simmet and Pohlmeier, 2020). On the other hand, the data-rich environment available in finance allows us to use a large instrument set, even if these instruments are weakly related to the market return. We can use, for instance, the assets return as instruments. Since the market return results from all asset returns, each of these assets (and possibly function of them) can explain the market return to some degree, for instance, which form a large set of candidates to instruments.
Using many instruments may itself be a source of bias to the estimates (Bekker, 1994;Newey and Smith, 2004;Ng and Bai, 2009). In terms of the two-step generalized method of moments, for example, it would imply a large number of moment conditions, which consequently produce biased estimates (Newey and Windmeijer, 2009). However, combining regularization methods with instrumental variable estimation can overcome this drawback. There are several approaches to deal with this regularization, ranging from factor analysis, as in Bai and Ng (2010), to methods that explores the covariance of instruments (Carrasco, 2012). An alternative approach is a Bayesian estimation that uses shrinkage priors over the parameters of the instrument. In particular, Hahn et al. (2018) proposed a factor-motivated prior structure to deal with many-instruments setting. In this context, the Bayesian approach is interesting because the first and second stages are jointly estimated, allowing a mutual influence. Thus, the Hahn et al.'s (2018) proposal to combine a many-instrument setting with a large set of the instrument is attractive.
In asset price models, this combination is an unexplored field. In this paper, we propose to estimate the capital asset price model using a large set of instruments and shrinkage priors over the parameters associated with the instruments. To do so, we use the Bayesian approach proposed by Hahn et al. (2018) to shrink unimportant instruments and compare the size of the estimated bias with that produced by the traditional estimation methods (ordinary least square and two-stage least square).
Our results showed that the Bayesian regularization priors over the instruments coefficient may improve the inference about CAPM beta. We carry out two simulations exercises: the first one, considers that the market return is a weighted sum of assets returns and the second one we simulate a classical measurement error problem. In the first experiment, we found that our approach can improve the average bias up to 90%, but this gain is attenuated as the number of instruments increase. The second experiment, the average bias is reduced by only 35%, and, again, the improvement is reduced as the set of instruments is enlarged. In an empirical application, we find that the estimated CAPM stock's beta is greater than the ordinary least square (OLS) estimator, while the traditional two-stage least square converges to the admittedly biased OLS estimator.

The CAPM and measurement errors
The seminal paper of Markovitz (1959) has prepared the framework for the Capital Asset Price Model (CAPM). The author established the investor problem in terms of a trade-off between risk and return and defined the mean-variance efficiency concept of a portfolio allocation. This definition states that, for a given level of return, the portfolio is mean-variance efficient if it minimizes the variance. Sharpe (1964) and Lintner (1965) worked on Markovitz (1959) results to analyze the implication for the asset price and developed what is called the Sharpe-Lintner CAPM, or just CAPM.
By assuming that investors possess homogeneous expectations, meaning that they agree about expectations of future investments, Sharpe (1964) and Lintner (1965) shown that, in the absence of market frictions, if all investors choose an efficient portfolio, then the market portfolio is also mean-variance efficient. In this context, the market portfolio includes all assets in the economy, for instance, stocks, real state, and commodities, which makes it an unobserved variable. In practice, usual surrogates for the market portfolio are market indexes, such as SP&500, but these indexes do not contain all assets and, consequently, the market portfolio is observed only with errors. Despite this practical difficulty, the efficiency of the market portfolio will imply a relation between assets riskpremium and the market risk premium.
where β i ≡ σ im /σ 2 m . Therefore, the CAPM summarized in equation (1) is an equilibrium result that holds for a single period.
The relation establish in equation (1) for one period is not enough to empirically assess the CAPM. To proceed with econometric analysis, an additional assumption is required: the returns are independent and identically distributed along time and multivariate normal. Although this hypothesis is a strong one, its posses some benefits. First, it is consistence with the CAPM hold for each period in time. Moreover, it is a good approximation for monthly returns (Campbell et al., 1997). Under this assumption, the CAPM may be represented by the single index model, which is described by In equation (2), if γ i is equal to zero, then the CAPM holds for each period in time.
The representation of the CAPM model given by equation (2) started a tradition of testing the model that became known as the time series approach. To empirically test the CAPM model Jensen et al. (1972) proposed to use time series for return of assets, for return risk-free asset and a proxy to the return of market portfolio to estimate equation (2). Usual choice for the risk-free asset is the US treasure bill and SP&500 for the return of the market portfolio. Then, their approach suggests testing whether the estimated intercept is equal to zero, which may be done by using a Wald test or the test proposed by Gibbons et al. (1989).
Testing the CAPM using the Jensen et al. (1972) approach is problematic once the return of market portfolio, R mt , is a variable observed only with errors. The source of measurement errors appears because the market indexes used to estimate the model contain only a sub set of assets. Moreover, even if all universe of assets were observed, the measurement error could appear due to misspecification in weights of assets. This problem is known as Roll's critique, due to Roll (1977) who argued that, once the market portfolio is not observed, the CAPM cannot be tested. According to this author, a rejection of the CAPM could be due to measurement errors in return of market portfolio. In an econometric sense, the present problem is a case of classical measurement errors, and should be treated as such.
To put the problem in terms of the classical measurement errors, letR mt denote the observed return of market portfolio. Also, denote by x * t the excess of return on the true market portfolio and by x t the excess of return on the observed market portfolio. The excess of return on the asset i is denoted by y it , and there is no error in variable in this case. Instead of equation (2), the model to be estimated to test the CAPM should be Equation (4) assumes that the error in excess of return on market portfolio is additive. If one ignores this additive measurement error and estimate equation (3) using least squares, then the estimates of betas will suffer from attenuation bias and the intercept will be upward biased, implying positive alphas, even if CAPM holds. Thus, to appropriately deal with error-in-variable problem, the equations (3) and (4) must be considered to estimate and test CAPM model.

Literature Review
This section aims to relate previous research with our paper, showing a lack of knowledge still unexplored. Two pieces of literature are associated with our research. First, the literature that estimates and test the CAPM model, considering measurement errors. In general, our paper is related to the time series approach proposed by Jensen et al. (1972), which is also known as one-pass regression. The two-pass regression proposed by Fama and MacBeth (1973) is another form to estimate the CAPM and also is subject to the error-in-variables problem. In this sense, research that use the two-pass regression approach and treat the underlying measurement error is also related to our work to some degree. However, we focus on one-pass regression. The second branch of research that is relevant is the econometric literature on measurement errors, which consider new alternatives to solve the problem, such as the selection or shrinkage of instruments in the context of many instruments.
Even before Roll's (1977) critique, the time series approach had to deal with measurement error problems, although of a different nature. The first attempt to test the CAPM was introduced by Jensen et al. (1972), who proposed to regress the excess return on a constant, γ i , and the excess of the return on the market portfolio. If the estimation considers securities individually, the test of γ i = 0 would introduce inefficiency and bias, since the residual of each regression may be correlated, as argued by King (1966). Considering the estimation with a single security does not consider the information contained in the other assets correlated with this security. The solution to this problem was to group the securities in portfolios, according to their risk, measured by the associated beta. However, if one uses the estimated beta from the sample at hand to group the assets, it would introduce a selection bias due to measurement errors: Assets that are in the group of higher betas may be in this group either because it indeed has a high beta or because it was overestimated (positive measurement error), and vice-versa. To avoid such a problem Jensen et al. (1972) used pre-sample estimated betas to form ten different portfolios and test the CAPM for these ten regressions. The results show that the CAPM is not valid for all portfolios using the CCRSP 1 as a proxy to the market return.
The measurement error discussed by Roll (1977) has a different nature. It is related to the fact that only a proxy of the market return is observable. This author argues that, given the latency of the market return, we cannot test the theory of CAPM, since alternative interpretations for rejection may arise: (i) the theory of CAPM is false; (ii) the surrogate of the market return is not adequate, or (iii) both (i) and (ii). Although the Jensen et al. (1972) approach does not deal with the latency of market return, together with the Roll's (1977) criticism, it opened a new agenda of research. In one hand, the one-pass regression approach needed to be generalized to jointly test the hypothesis of no intercept and develop a framework to estimate the model for each asset instead of grouped securities. The generalization to jointly test the hypothesis of γ i = 0 for all i was subsequently introduced by Gibbons et al. (1989). These authors estimate a model similar to Jensen et al. (1972), but propose an approach to test if γ i = 0 jointly. Gibbons et al. (1989) result also points to the failure of CAPM. In the other hand, the tradition of using a portfolio instead of individual assets has remained unchanged. Moreover, the measurement error, introduced by the proxy of the market return, requires suitable econometric methods.
After Roll's (1977) critique, several researchers tried to deal with measurement errors, using both Bayesian or Classical methods. A first answer to the Roll's (1977) critique was a branch of research that advocated that the latency of market return is not an empirical problem. Stambaugh (1982) proposed to consider alternative proxies to the market return and to evaluate if inferences about the theory validity change. The chosen proxies ranged from a simple market portfolio, containing only shares, to a more complete portfolio, including real states and consumption of durables in addition to stocks. Testing from the more complete market portfolio and sequentially excluding the components of 1 Chicago Center for Research in Security Prices this portfolio, the author concluded that there is no change in the inference and, therefore, the Roll's (1977) critique is not an empirical problem. In the same direction, examples of research that tried to build more precise proxies are Jagannathan and Wang (1996) and Campbell et al. (1997), which include human capital when measuring the aggregate return.
Using multiple proxies, Shanken (1987a) generalizes the problem in a Bayesian framework. This author consider linear combinations of multiples surrogates to the market portfolio and consider a prior information about the correlation between the true market portfolio and the proxy used in estimation. If the prior belief about the correlation between true market portfolio and the market index used as a proxy is enough high, then the following result is true: a rejection of theory using a proxy implies a rejection of CAPM for the true market portfolio (Shanken, 1987b). In this sense, if one believes that the proxy is highly correlated with the true market portfolio, the latency of market portfolio is not an empirical problem.
Different from the traditional frequentist approach, the Bayesian inference arises as a new form to incorporate prior information and test the theory (Shanken, 1987a;Harvey and Zhou, 1990;Kandel et al., 1995). In this setup, the CAPM can be tested using odds-ratio. Moreover, prior information for all parameters can be used, instead of only for the correlation parameter introduced by Shanken (1987a). Regarding this parameter, Kandel et al. (1995) estimated the posterior distribution of the correlation between proxy and market portfolio. Their result, however, shows that this posterior distribution is sensitive to the choice of a prior distribution. More recent research also used the Bayesian framework to test and dealt with the imperfection of proxies for market return (Cederburg and O'Doherty, 2015). The focus, however, is to econometrically control for the effects of the measurement error instead of using the correlation between proxies and market portfolios. In this sense, Cederburg and O'Doherty (2015) built a hierarchical Bayesian approach to estimate the CAPM, considering the error-in-variables problem. The authors find evidence in favor of CAPM.
On the frequentist side, the literature appealed to the instrumental variables estimation to deal with the measurement error problem. Examples of this approach are Coën andÉric Racicot (2007); Meng et al. (2011);Racicot et al. (2019) 2 . The challenge of this approach is to find an instrument that is sufficiently correlated to the proxy, once this variable presents martingale properties. The traditional instrument used in the literature, which is the endogenous variable lagged, cannot explain the endogenous variable. The alternative that the literature proposes is to use alternative instruments, such as technical instruments. Coën andÉric Racicot (2007) and Racicot et al. (2019), for instance, use higher moments to correct the measurement error. Their results show that if one does not correct for the measurement error, then there will be differences in inference about the theory validity, showing the importance of considering measurement error when estimate CAPM. It should be noted that using higher moments instruments can lead to serious problems in the presence of outliers.
The difficulty to find instruments for the market portfolio relates the CAPM test with another literature, which is the strand of research that studies the selection or shrinkage of instruments when a data-rich environment is available Ng and Bai (2009). In general, when dealing with financial data we have access to relatively large information data, which can form a large set of instrument candidates in the error-in-variable estimation.
In this sense, the literature on the selection of instruments is related to our paper. In particular, there is a branch of research that treats instrumental variable selection in a Bayesian approach (Hahn et al., 2018), which can connect the two pieces of literature exposed above. Since it is difficult to find good instruments in the context of CAPM estimation, one can consider all instruments available (non-technical, such as lagged variables, and technical, such as polynomials) and use the econometric method to deal with shrink the coefficients of less important instruments or selection of the best ones. Therefore, the literature of Bayesian instrumental variable estimation is also related to our proposal.
To our knowledge, there is no research that uses shrinkage methods in the instrumental variable regression of CAPM. This is exactly the gap of knowledge that our paper aims to fill. This sort of method may help to better explain the endogenous variable and therefore produce a more realistic estimation of the CAPM model, by correcting the problems caused by the measurement errors.

Methods and Data
The data-rich environment available in the financial data set allows us to use many instruments to correct the bias caused by measurement errors, even though these instruments are possibly weak. The many instruments setting needs to be used carefully, as long as it can itself be a source of bias. To overcome this inconvenience, we need a regularization step, such as variable selection or shrink of less important parameters. Examples of regularization methods are LASSO, ridge, Elastic Net, or via Bayesian shrinkage priors, which penalize the number of covariates in some fashion.
In the instrumental variable regression, it interesting to use a method that jointly estimates "two stages" and the Bayesian approach has this advantage. The regression of the treatment variable on the instruments and the estimation of the target variable on the treated variable can be estimated in a single step. In this sense, the Bayesian shrinkage priors are preferred rather than other regularization methods. In particular, the factorbased prior proposed by Hahn et al. (2018) has the advantage of linear combine the information in all possibly weak instruments in such a way that, taken together, makes them stronger. In the next subsection, we present this sort of shrinkage prior to the IV regression context.

Bayesian regularization methods in IV regression
When dealing with measurement error, the instrumental variable regression may be used. Consider the two model regression: where x t are the endogenous or treatment variable, z t is a (p × 1) vector of instruments, y t is the response variable, and it is supposed that Since p may be large (possibly larger than the number of observations, p >> n), some regularization on equation (5) is necessary. The Bayesian solution to this problem is to impose shrinkage priors on δ to shrink those parameters which have little power to explain x t . By imposing such a prior, the usual Gibbs sampler scheme (Lopes and Polson, 2014) used to estimate model (5)-(6) cannot be employed. Hahn et al. (2018) developed an elliptical slice sampler that can be dealt with arbitrary prior on δ, allowing us to use shrinkage prior, such as Laplace distribution, as well as the factor based-prior also developed by the same authors. Then, it is instructive to describe the estimation for arbitrary prior distribution on δ. To understand the Bayesian estimation of IV regression, consider the reduced form of equations (5) and (6): where ν xt ≡ ε xt and ν yt ≡ βε xt + ε y . Defining T = 1 0 β 1 implies that: with α ≡ σy σx ρ and ξ 2 ≡ (1 − ρ 2 )σ y . Note that, the parameters to be estimated are Θ = (σ 2 x , δ, ξ 2 , γ, β, α). Then, conditional on the set of instruments, the likelihood function may be written as: This decomposition of the likelihood function allows us to form a Gibbs sampler scheme by choosing the following prior distributions: Combining these priors with likelihood function in equation (10), give us the posterior distribution. To sample from this posteriori distribution, it is possible to break it in three full conditional posteriors to form a Gibbs sampler scheme. To explain these three blocks, it is useful to introduce some definitions. Define 3x ≡ (1, x, z ′ δ), M ≡ Σ 0 +x ′x , a ≡ k +n and b ≡ s+y ′ y −y ′x M −1x′ y. It is possible to show that f (y|x, z, δ) ∝ |M | − a 2 b − 1 2 . With these definitions we can describe each of these blocks: Full conditional posterior for δ|Θ, data: given Θ, from Equations (10) and (11) the conditional posterior is proportional to f (x|Θ)f (y, |x, Θ)π(δ). Since we are considering an arbitrary prior for δ, this full conditional posterior may not have closed form, requiring alternatives methods to sample it. Although traditional Metropolis-Hastings can be used in this case, it scales poorly due to possibly high dimension and multimodality of the full conditional posterior. Instead, Hahn et al. (2018) proposed to sample it using an elliptical slice sampler, which only requires the ability to evaluate π(δ). This algorithm is described below: Algorithm 1 Elliptical Slice Sampler 1: procedure SliceSampler(δ, σ 2 x , x, Z, y) 2: Compute ℓ ≡ log(f (y|x, Z, δ) + log(π(δ)) + log(v)

12:
return δ Note that the only requirement of the algorithm (1) is the ability to evaluate the prior density π(δ).
Full conditional posterior for σ 2 x |Θ, data: Fortunately, for an inverse-gamma prior on σ 2 x the full conditional posterior for σ 2 x have a closed form. Combining the likelihood f (x|Z, δ, σ 2 x ) with the prior given in (12), it is possible to show that the full conditional posterior is an Inverse-Gamma with shape parameter k x +n and scale s x + n i=1 (x t −z ′ t δ) 2 (see derivation in appendix).
Full conditional posterior for (ξ 2 , γ, β, α) ′ |Θ, data: This block also have a closed form. By using bivariate normal properties, we can write the likelihood in terms of the transformed variablex, which follows y t |x t ∼ N (x t θ, ξ 2 ). Combining this likelihood with the prior give in (13), it can be show that the full conditional posterior (ξ 2 , γ, β, α) ′ |Θ, data is a Normal-Inverse-Gamma distribution. Specifically: (See appendix for derivation). We can use these full conditional posteriors to form a three-block Gibbs sampler, by iteratively sampling over the blocks. This methodology is interesting because we can choose arbitrary priors for δ, and it still works well. In particular, we can elicit several shrinkages prior over δ, since the many instruments setting requires regularization.

Shrinkage priors for δ
There is a large range of shrinkage priors in the literature Van Erp et al. (2019). The underlying idea of these priors is to give a higher prior probability around zero, such that if the parameter is not too important, it shrinks to zero. In what follow, we present some of these priors that can directly applied to δ = (δ 1 , · · · , δ p ). Then, we proceed with the factor-based prior distribution.

Heavy-tailed priors
Popular choices of shrinkage prior are the Cauchy, Double-exponential (also known as Laplace), and Horseshoe densities (Carvalho et al., 2010). The horseshoe density is the stronger one, followed by Double-exponential. Although the Cauchy density is relatively weaker, it also concentrates much density around zero, and so it also works as a shrinkage prior. Figure 1 depicts these three priors for a single parameter. In the IV regression case, we can choose one of these priors for each δ j and assume that they are independent for all i ̸ = j, with i, j ∈ {1, · · · , p}. Although it may work, it neglects the covariance between the instruments. To consider the covariance between the instruments, a more sophisticated prior needed, and this subject is discussed in the next subsection.

Factor-based shrinkage prior
The idea underlying the factor-based prior, proposed by Hahn et al. (2018), is to explore the covariance of instruments to extract factors that represent 'strong' instruments. To formalize this intuition, consider the following decomposition of the covariance matrix of instruments: where, B is a (p × k) matrix and Ψ 2 is a diagonal (p × p). Despite the fact that every covariance matrix admits this decomposition, the interest here is on the case where k << p, where k represents the number of factors to be extracted, denoted by f t . Suppose that the instruments z t and the factors f t are normal jointly distributed as follow: This assumption implies that E[f t |z t ] = Az t =:f t , with A ≡ B ′ (BB ′ + Ψ 2 ) −1 . Now, consider the factor regression model: where θ is a (1 × k). From equation (15) and the definition off t it is possible to show that δ ′ = θA. However, this specification is only correct if δ lies in the row space of A; otherwise, the model is misspecified. Then, it is necessary to extend the model to include the possibility that δ lies in the row space of A. For that end, the specification in equation (15) needs to be modified where η is a (1 × p) vector of parameter andr t ≡ (I p + A + A)z t and A + denote the pseudo-inverse of A. In this case, it can be show that δ ′ = θA + η(I p + A + A). Definingδ ′ = (θ, η), we note that δ = Hδ, where Consequently, we can rewrite (16) as x t = Hδz t + ε t . Assuming we know A (and then H), this specification allows us to attribute a prior over δ by imposing strong shrinkage prior overδ. If we solve the system δ = Hδ, using the theory of pseudo-inverses, we havẽ δ = H + δ + (I p+k + H + H)ω, for an arbitrary vector ω. With this identity, conditional on ω, we can impose an horeseshoe prior, for instance, onδ and it induce a prior on δ. That is: Following Hahn et al. (2018), we assume that ω ∼ N (0, I k+p ). Once we know ω, we can evaluate the prior π(δ|ω), which is the only requirement of the slice sampler presented in algorithm 1. Then we can sample δ by induce a prior on δ via horseshoe prior overδ. Notice that, under this specification, the factor structure derived in this section is taken into account in the prior over δ. In practice, however, the matrices B, Ψ are unknown and, consequently, A and H is also unknown. Instead of estimating it in a Bayesian fashion, we use point estimate of these matrices, which is found by minimizing the trace of Cov(z t ) − D, by choosing D, subject to D be diagonal and positive-definite.

Data description
To analyze whether our empirical method performs well, we start by using it in simulated data by means of a Monte Carlo analysis. When we know the true generate process, we can calculate the error of estimate and compare it with alternatives methodology (for instance, OLS, 2SLS, LASSO, etc.). Besides, the simulation exercise, we also apply the empirical method in real financial data. To estimate the CAPM, we need asset return data, a surrogate for the market return, and a risk-free asset. As the risk-free asset, we consider the one-month Treasury bill rate, which we took from Kenneth web site 4 . We consider the sp500 returns as the surrogate for the market return. Finally, we consider 200 assets listed in the sp500 as the return of the assets. The data ranges from 2020-03-23 to 2021-12-31, totaling 451 observations. This choice of a relatively short sample period and high dimensionality in the set of instruments represents the context in which the measurement error problem in the measurement of the market portfolio is more relevant, and also where the bias is introduced by the dimensionality of the instruments in the estimation can be stronger.

Results
In this section, we describe and discuss the results of our paper. We begin by describing the outcome of a simulation exercise, in which we compare the Bayesian regularization discussed in the previous section with traditional ordinary least square and two-stage least square estimation. We consider two scenarios of simulated data. Finally, we use the observed data presented before to estimate the CAPM using our proposal procedure.

Monte Carlo analysis: Simulation procedures
We consider two scenarios to simulate CAPM. First, we begin by simulating the assets return and build the true market return based on these assets returns. In the second scenario, we use a traditional classical measurement error model. In both cases, we use descriptive statistics of observed data to calibrate the parameters, so that the simulated data is a good approximation of the data.

First strategy: Market return as weighted sum of assets return
In the first simulation exercise, we consider that the true market return is a weighted average of all M assets of the economy, similar to the simulation used in Simmet and Pohlmeier (2020). In this case, we need to specify the weights b i of each asset i. With these weights and assuming that each asset return is Gaussian, we can build the true market return x * t , which, in turn, is used to build the observed market return by adding an error u t . Specifically, we simulated the CAPM model following the steps below.
We begin by drawing the weights b i from a uniform distribution: half of b i are drawn from a U (0, 1) and the other half from a U (1000, 1001) so that we can differentiate between assets with more weights. We normalize the weights so that M i=1 b i = 1. Then we simulate the assets return from a Gaussian distribution with mean zero and variance σ 2 ε . We build the true market return x * t = M i=1 b i y it and assume that the total number of assets in the economy is M ∈ {20, 80, 140}. The observed market return is x t = x * t + u t , where u t ∼ N (0, .001). In this case, although we do not know the values of the true β's we can estimate it by regressing y it on the true market return x * t . We can use the OLS estimate of this regression as the true β ′ s, since it is a consistent estimator for β i , even though it possesses sample error. We generate 10 4 observations to obtain this consistent estimation, although we use only 500 observations for the comparison of the estimators.
We repeat this simulated procedure 100 times, and at each iteration, we estimate β using OLS, TSLS, and Bayesian IV with shrinkage prior over δ. We consider three types of priors for δ: i) the horseshoe, ii) the Laplace, which is known as Bayesian LASSO, iii) and the factor based shrinkage prior explained in section 4.2.2. For all instrumental variable estimation we consider the set of instruments is formed by all assets except the regressand in the CAPM equation. We believe that these variables satisfy the requirement of an instrument: they are correlated with the market return, but uncorrelated with the error term in CAPM equation. Table 1 presents the descriptive statistic for true betas and the estimated betas using five different methods: The main result in Table 1 is that while the distribution of the TSLS estimator dramatically shifted to the left, concerning the distribution of the true Beta (column 1), the Bayesian Instrumental Variable estimator does not accompany this shift (columns 4-6). All the three Bayesian estimations presented similar results, meaning that the effect of the three sorts of priors over instrumental variable coefficients is equivalent in this simulation. This result remains valid for all number of assets, M , considered.
The true beta in the first column of Table 1 is distributed about one. Because of the measurement error introduced, the OLS estimator is less than the true beta. Although the Bayesian instrumental variable estimator has improved concerning the two-stage least squares, the mean of estimated beta estimated by BIV approach is slightly lesser than the OLS estimator, especially for the large set of instruments (M = 80 and M = 140). In this sense, the estimation of a model remains problematic.
To further investigate the improvement of the Bayesian estimator concerning the traditional two-stage least square, we analyze the mean error (ME) and the root of mean squared error (RMSE), and the results are presented in Table 2. Both in terms of ME and RMSE, the Bayesian approach has ameliorated concerning the TSLS. This gain of accuracy, however, is attenuated as the number of instruments become large. For M = 20, the mean error has improved around 90% and the root of mean squared error around 83%. When we consider M = 140 assets, these gains diminish to 40% and 35% for the mean error and the root of squared error.

Second strategy: Classical measurement error
Consider a classical additive measurement error: for i ∈ {1, ..., M } and t ∈ {1, ..., 1000}. In equations (17) to (18) x * t is the true market return and it is assumed to be Gaussian with mean equal to zero and variance σ 2 x , u t is Gaussian measurement error, with variance σ 2 u , x t is the observed market return. The sensitivity to the market return, which is measured by β i , is assumed to be known, and based on these values and given the error term ε t , we can construct the assets return. The first and second moment of the error term ε t are, respectively, E[ε t ] = 0 and E[ε t ε t ] = σ 2 ε , and we assume that it is Gaussian.
To simulate the model, we need to calibrate the parameters (σ 2 u , σ 2 x , σ 2 ε , β). For the β, we consider a linear grid between 0.5 and 1.5. Based on data of a proxy of market return, we calibrated σ x = 0.01. We calibrated half of the assets with σ ε = 0.03 and the other half with σ ε = 0.04. These two different values are chosen based on the distribution of the standard deviation of the assets listed in the SP500, and the two values are intended to create strong and weak instruments. A lower standard deviation creates assets that will be stronger instruments than those with a higher standard deviation. Finally, we calibrate σ u = σ x * 0.9 to create a situation with a high measurement error but assure the model is identifiable. These parameters allow us to simulate all variables of interest in CAPM.
To evaluate the accuracy of each method, we simulated the model N sim = 100 times. At each iteration, we estimated the parameters using four methods. The first one was the traditional Ordinary Least Squares (OLS), regressing y it on x t , which is known to be inconsistent in the presence of measurement errors. Second, we consider an instrumental variable approach, by estimating a Two-Stage Least Squares (TSLS) and we used the set of instruments formed by all assets except the regressand in the CAPM equation. We believe that these variables satisfy the requirement of an instrument: they are correlated with the market return, but uncorrelated with the error term in the CAPM equation. Finally, we consider the same set of instruments to estimate the model using the Bayesian method described in section 3, and, in particular, we used the Horseshoe, Laplace, and Factor-Based Shrinkage prior distributions over δ. This estimation is referred to as BIV-HS, BIV-LASSO, and BIV-FBS, once the Laplace distribution is known as Bayesian LASSO.
We simulate and estimate the model for three different numbers of assets, 20, 80, and 140, as shown by Table 3. In the estimation process, we consider the asset with β = 1. Because of measurement error, the OLS estimator is downward biased. When we consider 20 instruments, all instrumental variable estimators achieve a mean value close to the true beta, and the BIV-HS is the one with the best performance.
Increasing the number of instruments to 80, all the instruments-based estimators diminish their performance, but the BIV-HS is still closest to the true value of one. The other Bayesian estimators based on Laplace and factor-based shrinkage prior, however, are worse than the two-stage least square, on average. Finally, when the number of instruments is 140, all estimators get closer to OLS estimator, and in this case, any Bayesian estimator can beat the two-stage least square.
We also compare some measures of accuracy that take into account both the mean error (ME) and the root mean of squared error (RMSE), which are presented in Table  4. These metrics are presented relative to the TSLS, so that values less than one mean that the estimator outperforms the TSLS, while values greater than one means that the estimator underperforms the TSLS. The results in Table 4 are in line with those presented in Table 3: the only Bayesian prior able to beat the TSLS is the Horseshoe and it just happens up to 80 instruments. Specifically, the BIV-HS produced a gain of 34% in the mean error for 20 instruments and 20% for 80 instruments. These gains are attenuated when we consider the RMSE measure.

Empirical Application
In this section, we use the Bayesian regularization procedure to estimate the CAPM using observed data. We estimate the model for three arbitrary assets: Agilent Technology, eBay, and Amazon. The instrument consists of all the other 200 stock returns listed in the SP500, and for each, the data are available for the period between 2020-03-23 and 2021-12-31. We estimate the model using three types of prior over the instrumental variable coefficients and the traditional OLS and TSLS.
The posterior distribution for CAPM beta was similar for the three Bayesian estimations (BIV-HS, BIV-LASSO, and BIV-FBS). Thus, it indicates that, for these stocks, the three priors perform equally. In the estimation process, the Gibbs schemes required many iterations to converge to the stationary distribution. Specifically, it required 10 4 iterations to warm up the Gibbs scheme, and 10 7 iterations saved every 100th to diminish autocorrelation of the Markov chain.
When comparing the CAPM beta estimated by the Bayesian IV approaches with those found by OLS and TSLS estimators, there is a difference, especially for the Amazon stock. Figure 2 presents the posterior distribution for the beta estimated by BIV-HS, as well as its mean, which is used as a punctual estimator compared with OLS and TSLS estimates. For all three stocks, the TSLS and OLS estimates are similar. It is an expected result since when the number of the instrument is large (as in this case, p = 200) relative to the number of observations (n = 451), the TSLS estimates tend to the OLS estimates 5 In addition, since the market return contains measurement error, we know that the OLS estimates are downward biased and, consequently, so are the TSLS estimates. The Bayesian approach, however, delivers a greater CAPM beta in all three cases (see dashed red line in figure 2). For Agilent Technology stock, whileβ OLS = 0.86 andβ T SLS = 0.86 the mean of posterior of beta from BIV is 0.89. For eBay stock, whileβ OLS = 0.72 and β T SLS = 0.73 the mean of posterior of beta from BIV is 0.80. Finally, For Amazon stock, whileβ OLS = 0.77 andβ T SLS = 0.73 the mean of posterior of beta from BIV is 1.21. Although we do not know the true beta in this case, we know that OLS is downward biased, which puts our approach in a better position.
These discrepancies of estimated beta may have drastic implications for finance practitioners. As noted by Malloch et al. (2020), some analysis in finance, such as valuation, is very sensitive to the estimated beta. Thus, the many instruments approach with shrinkage priors can offer a better way to make such financial analysis.

Final Remarks
This paper contributes to the literature that works on the solution of Roll's (1977) critique, in particular the strand of literature that uses the instrumental variable. The data-rich environment available in finance offers a wide range of instruments to solve the error-in-variable present in the capital asset price model. However, these instruments are usually week correlated with the endogenous market return, and use too many instruments may induce bias. This paper proposes to estimate the capital asset price model using a large set of instruments and shrinkage priors over the parameters associated with the instruments.
In a simulation exercise, the results show that the proposed approach reduces the bias in the estimation of the CAPM. In an empirical application, we verify that our approach produces greater betas than the OLS and TSLS methods. These results seem to put the Bayesian regularization approach in a better position when estimate CAPM model with many instruments. A word of caution, however, is needed, since even the Bayesian approach lose performance as the dimensionality increases.

Appendix 2: Results for small samples
This appendix presents an analog result to that presented in section 5.1.2, but for small samples. Tables 5 and 6 show the correspondent results of tables 3 and 4, but for a smaller number of observations, namely n = 500.