A New Estimator for Standard Errors with Few Unbalanced Clusters

In linear regression analysis, the estimator of the variance of the estimator of the regression coefficients should take into account the clustered nature of the data, if present, since using the standard textbook formula will in that case lead to a severe downward bias in the standard errors. This idea of a cluster-robust variance estimator (CRVE) generalizes to clusters the classical heteroskedasticity-robust estimator. Its justification is asymptotic in the number of clusters. Although an improvement, a considerable bias could remain when the number of clusters is low, the more so when regressors are correlated within cluster. In order to address these issues, two improved methods were proposed; one method, which we call CR2VE, was based on biased reduced linearization, while the other, CR3VE, can be seen as a jackknife estimator. The latter is unbiased under very strict conditions, in particular equal cluster size. To relax this condition, we introduce in this paper CR3VE-λ, a generalization of CR3VE where the cluster size is allowed to vary freely between clusters. We illustrate the performance of CR3VE-λ through simulations and we show that, especially when cluster sizes vary widely, it can outperform the other commonly used estimators.


Introduction
In linear regressions with clustered data, it is common practice to estimate the variance of the estimated parameters using the cluster-robust variance estimator (CRVE from hereon) introduced by Liang and Zeger (1986), as a generalization of the White (1980) heteroskedastic-robust estimator. The justification is asymptotic, with number of clusters tending to infinity. Bell and McCaffrey (2002) show that in a finite context, with few clusters and error terms that are correlated within cluster, CRVE leads to severely downwardbiased standard errors and thus to misleading inference about the estimated parameters. Moulton (1986Moulton ( , 1990 and Cameron and Miller (2015) point out that this issue is particularly relevant for regressors that are correlated within cluster such as policy variables that are implemented only in certain regions or states. An additional issue for inference about the estimated parameters is that, under the null hypothesis and with few clusters, the distribution of the test statistic is unknown and approximate normality cannot be claimed.
Following Bell and McCaffrey (2002), inferences about the estimated parameters can be improved by (i) reducing the bias of CRVE with either BRL (bias reduced linearization), also known as CR2VE, or the jackknife estimator v JK , also known as CR3VE, both based on transformed OLS residuals; CR2VE and CR3VE generalize, using clustered data, the heteroskedasticity-consistent covariance estimators HC2 and HC3, introduced by MacKinnon and White (1985). Inference about the estimated parameters can be also improved by (ii) approximating the distribution of the test statistic with the t-distribution with an extension of the Satterthwaite (1946) degrees of freedom (DOF) that are data-determined and regressor-specific. Imbens and Kolesar (2016) developed a more refined version of the data-determined regressor-specific DOF used by Bell and McCaffrey (2002). Bell and McCaffrey (2002) also show that CR3VE tends to overestimate the standard errors. In this paper, we introduce CR3VE-λ, a cluster-robust variance estimator that is identical to CR3VE in the case of balanced clusters but, in the case of unbalanced clusters, takes the difference in cluster sizes into account such that the computed standard errors are less conservative and unbiased under more general conditions. The paper is organized as follows. In Section 2, we discuss basic theory on CRVE, CR2VE and CR3VE. In Section 3, we introduce CR3VE-λ. In Section 4, we illustrate and test the performance of CRVE, CR2VE, CR3VE and CR3VE-λ to compute standard errors with few clusters using Monte Carlo simulations. In Section 5, we present ideas for future research related to the current paper. Section 6 concludes the paper.

Basic Theory: CRVE, CR2VE and CR3VE
Consider the regression model y = Xβ + ε with observations that can be grouped into C clusters of size n 1 , . . . , n C ; ∑ c n c = n. Write, for the c-th cluster, y c = X c β + ε c , with E(ε c ) = 0 and var(ε c ) = V c . The V c 's are collected in the block-diagonal matrix V. After OLS we have (1) An intuitively appealing cluster-robust variance estimator (CRVE) based on OLS residuals per clusterε c is This estimator, which directly generalizes White (1980) and was introduced by Liang and Zeger (1986), is consistent when the number of clusters goes to infinity. The same holds when (2) is scaled, as in Stata, by the factor C(n − 1)/(C − 1)(n − k), with k the number of regressors. Since this factor is larger than one, it increases the estimated variance.
In the case of few clusters, asymptotics will be a poor guide. In what follows, we therefore consider its bias instead. Let M = I n − X(X X) −1 X , let S c be the n × n c matrix that selects the columns of M corresponding to cluster c, let L c ≡ MS c and let There holds H c = L c L c since M is idempotent and symmetric. Withε = Mε andε c = L c ε, To reduce the bias, consider choosing a variance estimator based on transformed residuals ε c ≡ A cεc , for some A c . Then From (1), unbiasedness requires the A c to be such that A c L c VL c A c = V c for all c uniformly in the V c . This is infeasible and therefore we consider two second-best solutions.
The first solution is to consider the case of no cluster effects, V c = σ 2 I n c for all c, and make the estimator unbiased for this case.
The variance estimator is unbiased if A c H c A c = I n c and so we choose A c = H − 1 /2 c . This estimator, introduced by Bell and McCaffrey (2002) and called BRL, is extensively discussed by Cameron and Miller (2015) and it is also known as CR2VE.
The second solution is based on the idea that the elements in M outside the blocks on the diagonal may be small. Then L c can be approximated by a matrix with H c as its c-th block and zeros outside this block. Then L c VL c = H c V c H c and choosing A c = H −1 c leads, when scaled by a factor (C − 1)/C, to an estimator that is approximately unbiased when there are no cluster effects. This estimator with the jackknife correction is also introduced by Bell and McCaffrey (2002), who called it v JK , it is discussed by Cameron and Miller (2015) and it is also known as CR3VE. CR2VE and CR3VE can be computationally intensive because they require the inversion of matrices of order equal to the cluster sizes. CR2VE and CR3VE can be computed efficiently, that is, with computing time and storage of order O(n c ); a succinct proof is given by Niccodemi et al. (2020).
Both CR2VE and CR3VE are used in the literature as an alternative to bootstrapping. The bootstrap literature has evolved rapidly since Cameron et al. (2008) proposed the use of a wild cluster bootstrap procedure to improve inference in the case of few clusters. Generally, the wild cluster bootstrap procedure performs well. However, MacKinnon and Webb (2017) show that inference based on this procedure can fail in the case of dummy regressors equal to zero or one in very few clusters. Djogbenou et al. (2019) propose an asymptotic analysis of cluster-robust inference mainly focused on the wild cluster bootstrap procedure, proving its asymptotic validity under certain conditions on the cluster sizes. They show, both theoretically and through some experiments, how variation in cluster sizes affects the asymptotic validity of this procedure and they conclude that the wild cluster restricted bootstrap using the Rademacher distribution performs better than any other competitors.

From CR3VE to CR3VE-λ
To analyze the bias of CR3VE we scale (4) by (C − 1)/C and use When clusters are balanced and have the same covariance structure then X c X c = X X/C for all c, and (5) reduces to E[ var(β)] = σ 2 (X X) −1 . Thus, in the case of balanced clusters, CR3VE with the correction factor (C − 1)/C is unbiased. We propose a different scaling factor than (C − 1)/C for CR3VE in the more general case of unbalanced clusters that still have the same covariance structure. Define π c ≡ n c /n for cluster c. Then X c X c = π c X X and the expression in parentheses in (5) becomes λ(X X) −1 , with and λ ≥ C/(C − 1), with equality holding in the case of balanced clusters. To see this, let π ≡ (π 1 , . . . , π C ) , Π ≡ diag(π), a ≡ (I C − Π) − 1 /2 π and b ≡ ( . This suggests that 1/λ may be a better scaling factor than (C − 1)/C. As 1/λ ≤ (C − 1)/C, we propose a lower estimate of the variance than with CR3VE. This fits in well with the observation by Bell and McCaffrey (2002), as mentioned in the Introduction, that CR3VE tends to overestimate the standard errors. We denote this estimator, which is unbiased under more general conditions than CR3VE, by CR3VE-λ.

Monte Carlo Simulations
We run several sets of Monte Carlo (MC) simulations and compare the bias of the standard errors based on unclustered standard errors (UN), CRVE, CR2VE and CR3VE with the bias of the standard errors based on CR3VE-λ. In each simulation, we generate randomly C unbalanced clusters with number of observations per cluster n c ∼ U{1000 − g, 1000 + g}, where g is different in each set of simulations. In other words, n c is drawn from a uniform distribution with constant mean but standard deviation that depends on g. We generate our dependent variable y hc = α + βx hc + γd c + e hc , where h identifies the single observation (e.g., household) and c identifies the C clusters of size n c = n 1 , . . . , n C , and where x hc = q hc + z c and e hc = w hc + u c . Moreover, q hc , z c , w hc , u c are independently drawn from N(0, 1), α = 0 and β = γ = 1, and d c is a dummy variable constant within cluster and randomly constrained, in each simulation, to be equal to 1 in half of the randomly generated clusters. The simulation set-up is somewhat similar to the one in Cameron et al. (2008). As pointed out by Cameron and Miller (2015), unclustered standard errors and CRVE are likely to be severely biased if the cluster effect and the correlation of the regressors within cluster are different from zero. Therefore, we set up experiments that allow both e hc and the regressors to be correlated within cluster, including the extreme case of d c , a dummy variable that is constant within cluster. The presence of regressors correlated within cluster implies that the assumption under which CR3VE and CR3VE-λ are unbiased are not met. Yet, CR3VE-λ takes into account the difference in cluster size and, as this difference increases, it is expected to be less biased than CR3VE.
We run 100,000 simulations for each MC set and each MC set differs with respect to the number of clusters C and g. We show results for C = 4 and C = 6, and for g = 0 (i.e., balanced clusters), g = 250, g = 500, g = 900 and g = 990, with standard deviation of the cluster size equal to 0, 145, 289, 520 and 572, respectively. For each simulation: (i) we compute the true standard deviation ofβ, sd(β), based on and where β = (α, β, γ), (ii) we compute the standard errors ofβ and ofγ based on the different methods se UN , se CRVE , se CR2VE , se CR3VE and se CR3VE−λ , (iii) we compute the difference between the standard errors based on the different methods and the true standard deviations sd(β) and sd(γ). Finally, for each MC set we compute the mean of this difference (i.e., the estimated bias) for each method to compute the standard errors.
From Tables 1 and 2 we can see that CR3VE-λ always leads to the least biased standard errors, with estimated bias always close to zero. Moreover, it remarkably reduces the estimated bias of CR3VE with high unbalancedness. This is especially true for the dummy variable d i . We acknowledge that the reader might be particularly interested in comparing the inferential performance of the various CRVEs, including CR3VE-λ, especially in a real-data setting. For this purpose we refer the reader to Niccodemi et al. (2020), where inferential results based on the Current Population Survey data clustered in few, highly unbalanced clusters and the t-distribution using the Imbens and Kolesar (2016) DOF are reported. This experiment is similar to the one developed by Cameron and Miller (2015), although more focused on cluster unbalancedness. According to the results, with few, highly unbalanced clusters CR3VE-λ appears to be among the most promising methods for inference, as CR3VE tends to underreject a true null hypothesis.

A Note on Future Research
Future research on cluster-robust variance estimators, directly linked to the current work, might take at least two directions. First, Djogbenou et al. (2019) show through some experimental designs how the variation in cluster sizes affects the asymptotic validity of the wild cluster bootstrap. Testing how CR3VE-λ performs, in comparison to CR2VE and CR3VE and using the same experimental designs, might provide further elements to evaluate its performance.
Second, the effective number of clusters introduced by Carter et al. (2017) might be of particular interest for CR3VE-λ. The effective number of clusters depends, among others, on the cluster sizes. If the effective and the nominal number of clusters differ remarkably, and if this difference is, to some extent, due to heterogeneity in cluster sizes, then inference using CR3VE-λ might be much more accurate then inference based on CR3VE. Therefore, it would be interesting to develop experiments that focus on the interaction between the effective number of clusters as a diagnostic tool and the use of CR3VE-λ instead of CR3VE for inference. Of course, other possibilities include the use of the effective number of clusters to construct the scaling factor for CR3VE and the introduction of measures of the effective size of the clusters to compute CR3VE-λ.

Conclusions
We propose CR3VE-λ, an estimator for clustered standard errors that improves the jackknife estimator and is unbiased under more general conditions in the case of few unbalanced clusters. In simulations, CR3VE-λ reduces the bias of CR3VE as the unbalancedness of the clusters increases. We also provide a reference to a longer working paper (i.e., Niccodemi et al. (2020)) that develops simulation results to compare inference based on CRVE, CR2VE, CR3VE and CR3VE-λ. Given the results of both sets of simulations, we suggest researchers to prefer CR3VE-λ to CR3VE in the case of (few) highly unbalanced clusters.
For all the computations and the empirical illustrations we used Stata/SE 15.0. This paper comes with a Stata do-file that can be used with any cross-sectional dataset for the efficient computation of the standard errors based on CRVE, CR2VE, CR3VE and CR3VE-λ and with a Stata do-file to replicate the Monte Carlo simulations. The Stata do-files are available upon request.