Next Article in Journal
Acknowledgment to Reviewers of Econometrics in 2021
Previous Article in Journal
An Entropy-Based Approach for Nonparametrically Testing Simple Probability Distribution Hypotheses
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

A New Estimator for Standard Errors with Few Unbalanced Clusters

Faculty of Humanities, Education and Social Sciences, University of Luxembourg, Esch-sur-Alzette, L-4366 Luxembourg, Luxembourg
Faculty of Economics and Business, University of Groningen, 9700 AV Groningen, The Netherlands
Author to whom correspondence should be addressed.
Econometrics 2022, 10(1), 6;
Original submission received: 11 October 2021 / Revised: 17 January 2022 / Accepted: 18 January 2022 / Published: 21 January 2022


In linear regression analysis, the estimator of the variance of the estimator of the regression coefficients should take into account the clustered nature of the data, if present, since using the standard textbook formula will in that case lead to a severe downward bias in the standard errors. This idea of a cluster-robust variance estimator (CRVE) generalizes to clusters the classical heteroskedasticity-robust estimator. Its justification is asymptotic in the number of clusters. Although an improvement, a considerable bias could remain when the number of clusters is low, the more so when regressors are correlated within cluster. In order to address these issues, two improved methods were proposed; one method, which we call CR2VE, was based on biased reduced linearization, while the other, CR3VE, can be seen as a jackknife estimator. The latter is unbiased under very strict conditions, in particular equal cluster size. To relax this condition, we introduce in this paper CR3VE- λ , a generalization of CR3VE where the cluster size is allowed to vary freely between clusters. We illustrate the performance of CR3VE- λ through simulations and we show that, especially when cluster sizes vary widely, it can outperform the other commonly used estimators.

1. Introduction

In linear regressions with clustered data, it is common practice to estimate the variance of the estimated parameters using the cluster-robust variance estimator (CRVE from hereon) introduced by Liang and Zeger (1986), as a generalization of the White (1980) heteroskedastic-robust estimator. The justification is asymptotic, with number of clusters tending to infinity. Bell and McCaffrey (2002) show that in a finite context, with few clusters and error terms that are correlated within cluster, CRVE leads to severely downward-biased standard errors and thus to misleading inference about the estimated parameters. Moulton (1986, 1990) and Cameron and Miller (2015) point out that this issue is particularly relevant for regressors that are correlated within cluster such as policy variables that are implemented only in certain regions or states. An additional issue for inference about the estimated parameters is that, under the null hypothesis and with few clusters, the distribution of the test statistic is unknown and approximate normality cannot be claimed.
Following Bell and McCaffrey (2002), inferences about the estimated parameters can be improved by (i) reducing the bias of CRVE with either BRL (bias reduced linearization), also known as CR2VE, or the jackknife estimator v J K , also known as CR3VE, both based on transformed OLS residuals; CR2VE and CR3VE generalize, using clustered data, the heteroskedasticity-consistent covariance estimators HC2 and HC3, introduced by MacKinnon and White (1985). Inference about the estimated parameters can be also improved by (ii) approximating the distribution of the test statistic with the t-distribution with an extension of the Satterthwaite (1946) degrees of freedom (DOF) that are data-determined and regressor-specific. Imbens and Kolesar (2016) developed a more refined version of the data-determined regressor-specific DOF used by Bell and McCaffrey (2002).
Bell and McCaffrey (2002) also show that CR3VE tends to overestimate the standard errors. In this paper, we introduce CR3VE- λ , a cluster-robust variance estimator that is identical to CR3VE in the case of balanced clusters but, in the case of unbalanced clusters, takes the difference in cluster sizes into account such that the computed standard errors are less conservative and unbiased under more general conditions.
The paper is organized as follows. In Section 2, we discuss basic theory on CRVE, CR2VE and CR3VE. In Section 3, we introduce CR3VE- λ . In Section 4, we illustrate and test the performance of CRVE, CR2VE, CR3VE and CR3VE- λ to compute standard errors with few clusters using Monte Carlo simulations. In Section 5, we present ideas for future research related to the current paper. Section 6 concludes the paper.

2. Basic Theory: CRVE, CR2VE and CR3VE

Consider the regression model y = X β + ε with observations that can be grouped into C clusters of size n 1 , , n C ; c n c = n . Write, for the c-th cluster, y c = X c β + ε c , with E ( ε c ) = 0 and var ( ε c ) = V c . The V c ’s are collected in the block-diagonal matrix V . After OLS we have
var ( β ^ ) = ( X X ) 1 X V X ( X X ) 1 = ( X X ) 1 c X c V c X c ( X X ) 1 .
An intuitively appealing cluster-robust variance estimator (CRVE) based on OLS residuals per cluster ε ^ c is
var ^ ( β ^ ) = ( X X ) 1 c X c ε ^ c ε ^ c X c ( X X ) 1 .
This estimator, which directly generalizes White (1980) and was introduced by Liang and Zeger (1986), is consistent when the number of clusters goes to infinity. The same holds when (2) is scaled, as in Stata, by the factor C ( n 1 ) / ( C 1 ) ( n k ) , with k the number of regressors. Since this factor is larger than one, it increases the estimated variance. In the case of few clusters, asymptotics will be a poor guide. In what follows, we therefore consider its bias instead.
Let M = I n X ( X X ) 1 X , let S c be the n × n c matrix that selects the columns of M corresponding to cluster c, let L c M S c and let
H c S c M S c = I n c X c ( X X ) 1 X c .
There holds H c = L c L c since M is idempotent and symmetric. With ε ^ = M ε and ε ^ c = L c ε , we then have E ( ε ^ c ε ^ c ) = L c V L c V c , so that
E [ var ^ ( β ^ ) ] = ( X X ) 1 c X c L c V L c X c ( X X ) 1 var ( β ^ ) .
To reduce the bias, consider choosing a variance estimator based on transformed residuals ε ˜ c A c ε ^ c , for some A c . Then
E [ var ^ ( β ^ ) ] = ( X X ) 1 c X c A c L c V L c A c X c ( X X ) 1 .
From (1), unbiasedness requires the A c to be such that A c L c V L c A c = V c for all c uniformly in the V c . This is infeasible and therefore we consider two second-best solutions.
The first solution is to consider the case of no cluster effects, V c = σ 2 I n c for all c, and make the estimator unbiased for this case. Then E ( ε ^ c ε ^ c ) = L c V L c = σ 2 L c L c = σ 2 H c and consequently
E [ var ^ ( β ^ ) ] = σ 2 ( X X ) 1 c X c A c H c A c X c ( X X ) 1 .
The variance estimator is unbiased if A c H c A c = I n c and so we choose A c = H c 1 2 . This estimator, introduced by Bell and McCaffrey (2002) and called BRL, is extensively discussed by Cameron and Miller (2015) and it is also known as CR2VE.
The second solution is based on the idea that the elements in M outside the blocks on the diagonal may be small. Then L c can be approximated by a matrix with H c as its c-th block and zeros outside this block. Then L c V L c = H c V c H c and choosing A c = H c 1 leads, when scaled by a factor ( C 1 ) / C , to an estimator that is approximately unbiased when there are no cluster effects. This estimator with the jackknife correction is also introduced by Bell and McCaffrey (2002), who called it v J K , it is discussed by Cameron and Miller (2015) and it is also known as CR3VE. CR2VE and CR3VE can be computationally intensive because they require the inversion of matrices of order equal to the cluster sizes. CR2VE and CR3VE can be computed efficiently, that is, with computing time and storage of order O ( n c ) ; a succinct proof is given by Niccodemi et al. (2020).
Both CR2VE and CR3VE are used in the literature as an alternative to bootstrapping. The bootstrap literature has evolved rapidly since Cameron et al. (2008) proposed the use of a wild cluster bootstrap procedure to improve inference in the case of few clusters. Generally, the wild cluster bootstrap procedure performs well. However, MacKinnon and Webb (2017) show that inference based on this procedure can fail in the case of dummy regressors equal to zero or one in very few clusters. Djogbenou et al. (2019) propose an asymptotic analysis of cluster-robust inference mainly focused on the wild cluster bootstrap procedure, proving its asymptotic validity under certain conditions on the cluster sizes. They show, both theoretically and through some experiments, how variation in cluster sizes affects the asymptotic validity of this procedure and they conclude that the wild cluster restricted bootstrap using the Rademacher distribution performs better than any other competitors.

3. From CR3VE to CR3VE- λ

To analyze the bias of CR3VE we scale (4) by ( C 1 ) / C and use
A c H c A c = H c 1 = I n c + X c ( X X X c X c ) 1 X c
to obtain
E [ var ^ ( β ^ ) ] = C 1 C σ 2 ( X X ) 1 + c ( X X ) 1 X c X c ( X X X c X c ) 1 X c X c ( X X ) 1 .
When clusters are balanced and have the same covariance structure then X c X c = X X / C for all c, and (5) reduces to E [ var ^ ( β ^ ) ] = σ 2 ( X X ) 1 . Thus, in the case of balanced clusters, CR3VE with the correction factor ( C 1 ) / C is unbiased.
We propose a different scaling factor than ( C 1 ) / C for CR3VE in the more general case of unbalanced clusters that still have the same covariance structure. Define π c n c / n for cluster c. Then X c X c = π c X X and the expression in parentheses in (5) becomes λ ( X X ) 1 , with
λ 1 + c π c 2 1 π c ,
and λ C / ( C 1 ) , with equality holding in the case of balanced clusters. To see this, let π ( π 1 , , π C ) , Π diag ( π ) , a ( I C Π ) 1 2 π and b ( I C Π ) 1 2 ι C , a a = π ( I C Π ) 1 π , b b = ι C ( I C Π ) ι C , and a b = 1 . Since ( a b ) 2 a a b b there holds
c π c 2 1 π c = π ( I C Π ) 1 π 1 ι C ( I C Π ) ι C = 1 C 1 ,
so λ 1 1 / ( C 1 ) or λ C / ( C 1 ) . This suggests that 1 / λ may be a better scaling factor than ( C 1 ) / C . As 1 / λ ( C 1 ) / C , we propose a lower estimate of the variance than with CR3VE. This fits in well with the observation by Bell and McCaffrey (2002), as mentioned in the Introduction, that CR3VE tends to overestimate the standard errors. We denote this estimator, which is unbiased under more general conditions than CR3VE, by CR3VE- λ .

4. Monte Carlo Simulations

We run several sets of Monte Carlo (MC) simulations and compare the bias of the standard errors based on unclustered standard errors (UN), CRVE, CR2VE and CR3VE with the bias of the standard errors based on CR3VE- λ . In each simulation, we generate randomly C unbalanced clusters with number of observations per cluster n c U { 1000 g , 1000 + g } , where g is different in each set of simulations. In other words, n c is drawn from a uniform distribution with constant mean but standard deviation that depends on g. We generate our dependent variable y h c = α + β x h c + γ d c + e h c , where h identifies the single observation (e.g., household) and c identifies the C clusters of size n c = n 1 , , n C , and where x h c = q h c + z c and e h c = w h c + u c . Moreover, q h c , z c , w h c , u c are independently drawn from N ( 0 , 1 ) , α = 0 and β = γ = 1 , and d c is a dummy variable constant within cluster and randomly constrained, in each simulation, to be equal to 1 in half of the randomly generated clusters. The simulation set-up is somewhat similar to the one in Cameron et al. (2008). As pointed out by Cameron and Miller (2015), unclustered standard errors and CRVE are likely to be severely biased if the cluster effect and the correlation of the regressors within cluster are different from zero. Therefore, we set up experiments that allow both e h c and the regressors to be correlated within cluster, including the extreme case of d c , a dummy variable that is constant within cluster. The presence of regressors correlated within cluster implies that the assumption under which CR3VE and CR3VE- λ are unbiased are not met. Yet, CR3VE- λ takes into account the difference in cluster size and, as this difference increases, it is expected to be less biased than CR3VE.
We run 100,000 simulations for each MC set and each MC set differs with respect to the number of clusters C and g. We show results for C = 4 and C = 6 , and for g = 0 (i.e., balanced clusters), g = 250 , g = 500 , g = 900 and g = 990 , with standard deviation of the cluster size equal to 0, 145, 289, 520 and 572, respectively. For each simulation: (i) we compute the true standard deviation of β ^ , s d ( β ^ ) , based on
var ( β ^ ) = ( X X ) 1 c X c V c X c ( X X ) 1 ,
V c = I n c + ι c ι c ,
and where β = ( α , β , γ ) , (ii) we compute the standard errors of β ^ and of γ ^ based on the different methods se U N , se C R V E , se C R 2 V E , se C R 3 V E and se C R 3 V E λ , (iii) we compute the difference between the standard errors based on the different methods and the true standard deviations sd ( β ^ ) and sd ( γ ^ ) . Finally, for each MC set we compute the mean of this difference (i.e., the estimated bias) for each method to compute the standard errors. From Table 1 and Table 2 we can see that CR3VE- λ always leads to the least biased standard errors, with estimated bias always close to zero. Moreover, it remarkably reduces the estimated bias of CR3VE with high unbalancedness. This is especially true for the dummy variable d i .
We acknowledge that the reader might be particularly interested in comparing the inferential performance of the various CRVEs, including CR3VE- λ , especially in a real-data setting. For this purpose we refer the reader to Niccodemi et al. (2020), where inferential results based on the Current Population Survey data clustered in few, highly unbalanced clusters and the t-distribution using the Imbens and Kolesar (2016) DOF are reported. This experiment is similar to the one developed by Cameron and Miller (2015), although more focused on cluster unbalancedness. According to the results, with few, highly unbalanced clusters CR3VE- λ appears to be among the most promising methods for inference, as CR3VE tends to underreject a true null hypothesis.

5. A Note on Future Research

Future research on cluster-robust variance estimators, directly linked to the current work, might take at least two directions. First, Djogbenou et al. (2019) show through some experimental designs how the variation in cluster sizes affects the asymptotic validity of the wild cluster bootstrap. Testing how CR3VE- λ performs, in comparison to CR2VE and CR3VE and using the same experimental designs, might provide further elements to evaluate its performance.
Second, the effective number of clusters introduced by Carter et al. (2017) might be of particular interest for CR3VE- λ . The effective number of clusters depends, among others, on the cluster sizes. If the effective and the nominal number of clusters differ remarkably, and if this difference is, to some extent, due to heterogeneity in cluster sizes, then inference using CR3VE- λ might be much more accurate then inference based on CR3VE. Therefore, it would be interesting to develop experiments that focus on the interaction between the effective number of clusters as a diagnostic tool and the use of CR3VE- λ instead of CR3VE for inference. Of course, other possibilities include the use of the effective number of clusters to construct the scaling factor for CR3VE and the introduction of measures of the effective size of the clusters to compute CR3VE- λ .

6. Conclusions

We propose CR3VE- λ , an estimator for clustered standard errors that improves the jackknife estimator and is unbiased under more general conditions in the case of few unbalanced clusters. In simulations, CR3VE- λ reduces the bias of CR3VE as the unbalancedness of the clusters increases. We also provide a reference to a longer working paper (i.e., Niccodemi et al. (2020)) that develops simulation results to compare inference based on CRVE, CR2VE, CR3VE and CR3VE- λ . Given the results of both sets of simulations, we suggest researchers to prefer CR3VE- λ to CR3VE in the case of (few) highly unbalanced clusters.
For all the computations and the empirical illustrations we used Stata/SE 15.0. This paper comes with a Stata do-file that can be used with any cross-sectional dataset for the efficient computation of the standard errors based on CRVE, CR2VE, CR3VE and CR3VE- λ and with a Stata do-file to replicate the Monte Carlo simulations. The Stata do-files are available upon request.

Author Contributions

All authors have contributed equally to the research. All authors have read and agreed to the published version of the manuscript.


This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.


We are grateful to Viola Angelini, Rob Alessie, Nick Koning, Erik Meijer, Douglas Miller, Ulrich Schneider, Roberto Wessels and four referees for their helpful comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.


  1. Bell, Robert M., and Daniel F. McCaffrey. 2002. Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology 28: 169–79. [Google Scholar]
  2. Cameron, A. Colin, and Douglas L. Miller. 2015. A practitioner’s guide to cluster-robust inference. Journal of Human Resources 50: 317–72. [Google Scholar] [CrossRef]
  3. Cameron, A. Colin, Jonah B. Gelbach, and Douglas L. Miller. 2008. Bootstrap-based improvements for inference with clustered errors. The Review of Economics and Statistics 90: 414–27. [Google Scholar] [CrossRef]
  4. Carter, Andrew V., Kevin T. Schnepel, and Douglas G. Steigerwald. 2017. Asymptotic behavior of a t-test robust to cluster heterogeneity. Review of Economics and Statistics 99: 698–709. [Google Scholar] [CrossRef]
  5. Djogbenou, Antoine A., James G. MacKinnon, and Morten Ørregaard Nielsen. 2019. Asymptotic theory and wild bootstrap inference with clustered errors. Journal of Econometrics 212: 393–412. [Google Scholar] [CrossRef][Green Version]
  6. Imbens, Guido W., and Michal Kolesar. 2016. Robust standard errors in small samples: Some practical advice. Review of Economics and Statistics 98: 701–12. [Google Scholar] [CrossRef]
  7. Liang, Kung-Yee, and Scott L. Zeger. 1986. Longitudinal data analysis using generalized linear models. Biometrika 73: 13–22. [Google Scholar] [CrossRef]
  8. MacKinnon, James G., and Halbert White. 1985. Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. Journal of Econometrics 29: 305–25. [Google Scholar] [CrossRef][Green Version]
  9. MacKinnon, James G., and Matthew D. Webb. 2017. Wild bootstrap inference for wildly different cluster sizes. Journal of Applied Econometrics 32: 233–54. [Google Scholar] [CrossRef][Green Version]
  10. Moulton, Brent R. 1986. Random group effects and the precision of regression estimates. Journal of Econometrics 32: 385–97. [Google Scholar] [CrossRef]
  11. Moulton, Brent R. 1990. An illustration of a pitfall in estimating the effects of aggregate variables on micro unit. Review of Economics and Statistics 72: 334–38. [Google Scholar] [CrossRef]
  12. Niccodemi, Gianmaria, Rob Alessie, Viola Angelini, Jochen Mierau, and Thomas Wansbeek. 2020. Refining Clustered Standard Errors with Few Clusters. SOM Research Report 2021002. Groningen: University of Groningen. [Google Scholar]
  13. Satterthwaite, Franklin E. 1946. An approximate distribution of estimates of variance components. Biometrics Bulletin 2: 110–14. [Google Scholar] [CrossRef] [PubMed]
  14. White, Halbert L. 1980. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48: 817–38. [Google Scholar] [CrossRef]
Table 1. Estimated bias of s e ( β ^ ) based on different methods: 100,000 Monte Carlo simulations.
Table 1. Estimated bias of s e ( β ^ ) based on different methods: 100,000 Monte Carlo simulations.
Std. Deviation Cluster Size
4 clusters
E ^ [ sd ( β ^ ) ] w0.1978w0.1967w0.1929w0.1790w0.1745
Bias ^ [ s e U N ( β ^ ) ] −0.1820−0.1809−0.1769−0.1628−0.1581
Bias ^ [ s e C R V E ( β ^ ) ] −0.1293−0.1271−0.1207−0.1069−0.1043
Bias ^ [ s e C R 2 V E ( β ^ ) ] −0.0667−0.0663−0.0644−0.0605−0.0599
Bias ^ [ s e C R 3 V E ( β ^ ) ] w0.0191w0.0192w0.0188w0.0164w0.0157
Bias ^ [ s e C R 3 V E λ ( β ^ ) ] w0.0191w0.0184w0.0157w0.0066w0.0040
6 clusters
E ^ [ sd ( β ^ ) ] w0.1839w0.1837w0.1829w0.1811w0.1807
Bias ^ [ s e U N ( β ^ ) ] −0.1709−0.1707−0.1699−0.1679−0.1675
Bias ^ [ s e C R V E ( β ^ ) ] −0.0775−0.0774−0.0792−0.0844−0.0868
Bias ^ [ s e C R 2 V E ( β ^ ) ] −0.0301−0.0300−0.0325−0.0386−0.0413
Bias ^ [ s e C R 3 V E ( β ^ ) ] w0.0198w0.0208w0.0199w0.0202w0.0195
Bias ^ [ s e C R 3 V E λ ( β ^ ) ] w0.0198w0.0204w0.0182w0.0142w0.0120
Table 2. Estimated bias of s e ( γ ^ ) based on different methods: 100,000 Monte Carlo simulations.
Table 2. Estimated bias of s e ( γ ^ ) based on different methods: 100,000 Monte Carlo simulations.
Std. Deviation Cluster Size
4 clusters
E ^ [ sd ( γ ^ ) ] w1.0209w1.0250w1.0369w1.0847w1.1066
Bias ^ [ s e U N ( γ ^ ) ] −0.9805−0.9843−0.9957−1.0416−1.0623
Bias ^ [ s e C R V E ( γ ^ ) ] −0.4700−0.4790−0.5038−0.6066−0.6533
Bias ^ [ s e C R 2 V E ( γ ^ ) ] −0.1868−0.1953−0.2181−0.3191−0.3703
Bias ^ [ s e C R 3 V E ( γ ^ ) ] w0.1005w0.1000w0.1023w0.1054w0.1068
Bias ^ [ s e C R 3 V E λ ( γ ^ ) ] w0.1005w0.0960w0.0856w0.0410w0.0225
6 clusters
E ^ [ sd ( γ ^ ) ] w0.8306w0.8355w0.8506w0.9059w0.9276
Bias ^ [ s e U N ( γ ^ ) ] −0.7965−0.8013−0.8163−0.8706−0.8919
Bias ^ [ s e C R V E ( γ ^ ) ] −0.2478−0.2531−0.2786−0.3628−0.3953
Bias ^ [ s e C R 2 V E ( γ ^ ) ] −0.0837−0.0861−0.1057−0.1653−0.1894
Bias ^ [ s e C R 3 V E ( γ ^ ) ] w0.0524w0.0556w0.0514w0.0564w0.0610
Bias ^ [ s e C R 3 V E λ ( γ ^ ) ] w0.0524w0.0537w0.0436w0.0265w0.0223
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Niccodemi, G.; Wansbeek, T. A New Estimator for Standard Errors with Few Unbalanced Clusters. Econometrics 2022, 10, 6.

AMA Style

Niccodemi G, Wansbeek T. A New Estimator for Standard Errors with Few Unbalanced Clusters. Econometrics. 2022; 10(1):6.

Chicago/Turabian Style

Niccodemi, Gianmaria, and Tom Wansbeek. 2022. "A New Estimator for Standard Errors with Few Unbalanced Clusters" Econometrics 10, no. 1: 6.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop