Abstract
We examine properties of permutation tests in the context of synthetic control. Permutation tests are frequently used methods of inference for synthetic control when the number of potential control units is small. We analyze the permutation tests from a repeated sampling perspective and show that the size of permutation tests may be distorted. Several alternative methods are discussed.
JEL Classification:
C12
1. Introduction
Synthetic control method, proposed and discussed by Abadie and Gardeazabal (2003) and Abadie et al. (2010), is a very useful way of conducting comparative studies when exact matches are unavailable. Estimation of treatment effects usually takes the form of comparing outcomes between the treated unit and the control unit. Common sense suggests that, for the comparison to be meaningful, the control unit needs to be similar to the treated unit in the absence of the treatment in various dimensions. Such a requirement may not be satisfied in many observational studies. In some cases, availability of panel data makes such comparisons reasonable, the difference-in-differences method being a very well-known example. The difference-in-differences method requires a very specific set of assumptions, i.e., the common trend assumption, which may not be plausible for many applications. The synthetic control method offers a sensible generalization of the difference-in-differences. The synthetic control is a linear combination of the potential control outcomes, where the weights are manufactured by analyzing the pre-intervention outcomes.
For the purpose of statistical inference with synthetic control, i.e., confidence interval and hypothesis testing, various versions of placebo tests are often adopted. The idea underlying the placebo tests is the usual permutation tests, where the critical value of a test statistic is computed under all possible permutations of the “treatment” assignments in the control units.
The idea of permutation test is very intuitive and attractive. Applying the synthetic control method to every potential control unit presumably allows researchers to assess the distribution of a test statistic under the null hypothesis of no treatment effects, and the inference is seemingly exact in the sense that the burden of asymptotic approximation can be obviated.
The purpose of this paper is very specific. We ask whether the permutation test is a reasonable idea in the context of the synthetic control method, and argue that the intuitive appeal of the permutation test is misplaced. The validity of permutation tests usually requires certain symmetry assumption, which is often violated in the context of synthetic control studies. Using Monte Carlo simulations, we document the size distortion of the permutation tests. We also discuss a few alternative methods of inference.
Alberto Abadie kindly pointed out that the placebo test in synthetic control is often based on randomization inference idea, under which the symmetry restriction is built-in, while our analysis is predicated from the usual random sampling perspective, which leads to the violation of symmetry. This perspective is shared with an anonymous referee, who notes that (i) the synthetic control literature uses permutation tests in the context of design-based inference, and, as such, the permutation tests have exact size; (ii) the present article shows that permutation tests may not have correct size under a different mode of inference based on repeated sampling, although interpreting the permutation tests in the previous literature as tests based on repeated sampling would be incorrect; and (iii) the present article also proposes some alternatives that are valid in a repeated sampling setting. It would be useful to understand the exact mechanism through which the difference between the two perspectives manifests itself. The same referee points out that the present paper adopts a setting where , while the original litereature assumes fixed .
2. Placebo Test and Synthetic Control
In this section, we provide a brief discussion of the placebo test in the context of the synthetic control method. We begin with an overview of the synthetic control, borrowing heavily from discussions in Abadie et al. (2010) and Doudchenko and Imbens (2016). We then move on to describe the placebo test, and point out the importance of the symmetry assumption. We argue that the symmetry assumption is violated in general for placebo tests using linear combinations of outcomes, such as synthetic control. We conclude this section that such violation should be expected in general even when a normalized version of the test statistic is adopted.
We start with the overview of the synthetic control method. Consider a panel data with cross sectional units observed over the time periods . Units are the control units that receive the treatment in none of the time periods. The unit receives no treatment in periods , and receives active treatment in time periods . For simplicity, we will often assume that . The outcome variable is such that if the jth unit receives treatment in time t, and otherwise. Obviously,
The idea underlying the synthetic control is that if there were some weights1 such that
during the pre-intervention periods (). Then, can be used as a (synthetic) control for during the post-intervention periods (). Abadie et al. (2010) and Doudchenko and Imbens (2016) discuss various methods of finding the ’s so that the requirement in Equation (1) is satisfied. We analyze the weights and the nature of approximation from the asymptotic perspective where . Note that a special case of the estimator discussed by Abadie et al. (2010, p. 496) solves
Under our interpretation, above is an estimator of .
Suppose that , , and satisfy strict stationarity. Without loss of generality, we also assume that and . We would have in probability as . Assuming that satisfies
we can understand that the population version of the synthetic control is such that the difference is designed to have a mean zero.
Our asymptotic interpretation is not the only possible one. Doudchenko and Imbens (2016) provide an in-depth analysis of many possible methods. Our interpretation, however, is helpful for two reasons. First, it makes a concrete interpretation of s as estimates of some pseudo-parameter, say ’s, along with analytic expressions of the ’s, which makes it easy to understand the potential pitfalls of permutation methods afterwards. Second, it helps us to motivate alternative methods of inference exploiting time series variation.
We now discuss how placebo tests can be used in the context of synthetic control. For this purpose, we first present a summary of the placebo tests/permutation tests. The tests are motivated to deal with the case where the number of the treated is small and the number of controls is relatively large. In order to focus on the salient feature of the tests, we will consider an extreme case and assume that there is only one treated unit.
The basic intuition underlying the general placebo test can be gleaned by examining a standard textbook case of randomized treatments. Suppose that there is cross sectional data with units, where the units are the control units and the unit receives the active treatment. A reasonable estimator of the treatment effect is the difference , where is the outcome of the unit , and denotes the average of the outcomes of the controls. Suppose that we are interested in testing whether the treatment had impact. Given that there is only one treated unit, the standard t-test comparing the difference of the mean outcomes is not applicable. On the other hand, common sense suggests that we may implement such a test by “assigning” each control unit to fictitious treatment. More precisely, one can estimate the empirical distribution of for , and use it as if it were the distribution of the treatment effect under the null hypothesis.3
Implementation of the placebo test with synthetic control requires a bit more notation. First let denote the estimator of . Although we will use the method of exact balancing later in our Monte Carlo simulations, we do not need to restrict ourselves to this particular estimator. For now, we can view as an output from a blackbox and let denote its probability limit as . Second, let denote the outcome of the same blackbox except that we use the kth unit as the outcome of the treated unit, and with as our control units. The placebo test then uses the empirical distribution of for as if it were the distribution of the treatment effect under the null hypothesis of no treatment effect. If the estimated effect belongs to the extreme tails of the empirical distribution, it is understood to be the evidence that the null hypothesis is incorrect.
In order to understand the size property of the placebo test, it helps to recall that the placebo test is a version of the permutation test, which requires for its validity what may be called the symmetry assumption. For review of this property, we will borrow the short discussion in Canay et al. (2017).4 Suppose that a researcher observes a vector of observations X, whose joint distribution is P. The objective is to test whether , where is a collection of probability distributions such that the distribution of X is equal to that of for every g in , where is a finite collection of transformations. The permutation test has the exact size if, for the test statistic , the critical value is taken from the distribution of for every g in . In the context of the placebo test above, one can understand X to be the vector , and to be the permutation of the Ys.
We note that the symmetry is not mathematically obvious in the context of synthetic control. In order for the permutation test to be valid, it is necessary for the distribution of and those of for to be identical. Even for the relatively simple model in Equation (3), the nature of the synthetic control is such that the symmetry does not naturally follow. Using the restriction in Equation (4), we may write
Even if the first two terms on the right-hand side of Equation (5) were identically equal to zero over the permutations, we believe that the third term is not likely to satisfy the symmetry property. This is because we believe that under the further restriction that the ’s have a finite variance, the term can be symmetric only when they are normally distributed.
We show that normality is necessary if the distribution of the error term in Equation (5), where , is to be symmetric up to normalization.5 Suppose that are i.i.d., and their common distribution is such that the variance is finite and the characteristic function does not disappear. If is a nontrivial function of s and s, then symmetry over the permutations requires that the marginal distributions of for should remain invariant over all possible s. Without loss of generality, we can focus on the distribution of , and conclude that the symmetry requires that there exists a random variable Y such that the distribution of is the same as that of for some scalar c. Because the standard deviation of is proportional to , we may without loss of generality take . This implies that the distribution of only depends on . In other words, for such that , the distribution of is identical to that of . In particular, let all components of be zero except for the first one. Then, the distribution of is identical to that of . This implies that should have a stable distribution.6 Because the only stable distribution with a finite variance is the normal distribution, we should conclude that normality is a necessary condition of the symmetry (up to normalization). Note that the third term in Equation (5) arises in an ideal situation where the weights do not need to be estimated and the first two terms completely disappear. Our analysis suggests that even if we normalize the third term by its standard deviation, the symmetry requires normal distribution. The necessity of normality assumption is about any linear combination so it applies a fortiori to synthetic control.
3. Monte Carlo
The discussion at the end of the previous section casts doubt on the placebo test, even for the simple case where the first two terms in Equation (5) can be ignored. In order to understand the roles that the first two terms may play, we adopt Monte Carlo simulations. We try to find data generating processes (DGPs hereafter) that generate a large amount of size distortions. This is helpful in understanding the potential problem of the placebo test from the uniformity perspective; after all, the mathematical definition of the “size” of a test is the maximum probability of rejection under the null, and here the null hypothesis is a composite hypothesis where the only requirement on the DGP is that the treatment effect is zero, which allows many possibilities on the terms in Equation (5). For this purpose, we found it most convenient to work with the first two terms in Equation (5), although we acknowledge that there may be other important sources of size distortion that we have not explored. Since the last paragraph of Section 2 showed that normalization does not abate the symmetry requirement, we examine the importance of the first two terms in Equation (5) using a more natural statistic. The version of the synthetic control that we use in the Monte Carlo is the method of exact balancing, the population version of which minimizes subject to and .7
The method of exact balancing may not be an ideal version of the synthetic control, but it reflects a certain ambiguity in the method of synthetic control. In the factor model in Equation (3), it is impossible to find weights such that for every , if is large enough, as long as is continuously distributed. In other words, the condition (2) in Abadie et al. (2010) is incompatible with the factor model unless . The assumption has at least two implications.8 First, the weights can be estimated without error with sufficiently large . Second, the distribution of the permutation test would have the point mass at zero, and as such, there is no reason to conduct any test. Both implications are questionable. In any case, under the assumption , the weights can be estimated (without error) by the method of least squares that minimizes . If the assumption is violated, the method of least squares would be subject to a version of measurement error problem; the true regressor there is in Equation (3), and the plays the role of a regressor with measurement error .9 Note that such a problem is avoided by the method of exact balancing.
We consider the method of exact balancing in this section not because it is necessarily an ideal version of the synthetic control, but because it is a convenient way of examining the impact of the first two terms in Equation (5). As mentioned at the beginning of this section, our analysis at the end of the previous section suggests that the placebo test may have a problem even when these two terms are dismissed, and the purpose of our Monte Carlo exercise is to focus on the potential impact of these two terms.
For our Monte Carlo analysis, we adopted a simplified version of the factor model in Equation (3) such that (i) ; (ii) is a scalar; (iii) ; (iv) is i.i.d. over j and t. In matrix notation, our estimator solves
where ℓ is a vector of ones. Because , we can see that the population counterpart solves
where .
We now write
where
Note that the term B(i) is equal to 0 by design here, although it can be in principle different from 0 depending on the DGP and the estimator chosen. We speculate that the placebo test is used in the hope that (a) is dominated by the term D(i) above; (b) the four terms A, B(ii), C(ii) and D(ii) above, which reflect the noise of estimating by , are ignorable; and (c) the two terms C(i) and D(i) more or less satisfy the symmetry property.
We argued in the previous section that the term D(i) is likely to violate the symmetry property. In order to assess the impacts of other terms, we consider the following variations in DGPs:
- Vary the values of ’s such that (a) none of the components of dominates; (b) only two of the elements are non-zero.
- Vary the values of ’s such that the unbalanced unobservable factors C(i) (a) disappear; and (b) are present.
- Vary such that the estimation errors in the weights are (a) prominent; and (b) negligible.
Combinations of the first two variations give us four different DGPs, shown as DGP No. 1 to No. 4 in Table 1.
Table 1.
Data Generating Processes (DGPs) that generate size distortion.
We considered two versions of the placebo tests: the first one is what might be called a feasible version of the test. Formally, for , let be a vector of outcomes for the jth control unit, let , and let be a matrix that deletes the jth column from Y. Then,
Similar to Equation (6), define the leave-one-out synthetic control weights for the jth control unit as a solution to
where is to delete the jth element from . We likewise define the population counterpart as a solution to
For and , let be the element in that corresponds to the kth control unit. In addition, define for . Then, for , we can compute
Let be the order statistics of ’s. We reject if or .
The second test is an infeasible version of the test, which is identical to the first test, except that we use the true value of , i.e.,
and we reject if or .
For each DGP, we try , and . For all designs, we set the level of the tests to be , and the number of Monte Carlo runs to be 1000.
The results are summarized in Table 2.10 We see size distortions in Table 2, especially DGP No. 2 and No. 4. The size distortion there cannot be attributed to the noise of estimating . First, the problem persists even as approaches unrealistically large values. Second, the size distortion is similar over the feasible and infeasible versions of the test. We suspect that the problem is a fundamental problem that may have something to do with the violation of symmetry. (An anonymous referee pointed out that DGPs No. 2 and No. 4 cannot produce synthetic controls that approximate the trajectory of the outcome for the treated, and that synthetic controls should not be applied in those settings.)
Table 2.
Null rejection rates of permutation tests.
Our Monte Carlo analysis indicates that the placebo test does have the size distortion problem. The results in Table 2 suggest that the size problem is potentially bigger in DGPs No. 2 and No. 4. DGPs No. 2 and No. 4 differ from No. 1 and No. 3 in that the ’s are nonzero and the aggregate shock plays a role as a consequence. Therefore, it is of interest to investigate further sources of asymmetry. For this purpose, we revisit the decomposition in Equation (5) of , assuming that the first and second terms in the factor model in Equation (3) are not present:11
This implies that the variance of can be written as
under the assumptions of the DGPs, where is the covariance matrix of the vector . Likewise, the variances of the permutation statistics are
Depending on the relative magnitudes of ’s, we can easily construct examples that violate the symmetry, such as DGPs No. 2 and No. 4. As of now, it is not clear to us whether there is another venue (other than the variation in the size of ), which leads to a violation of the symmetry.
4. Possible Alternatives to Placebo Tests
If we take the time series asymptotics () seriously, the problem can be avoided by using the same idea as in Andrews (2003). The hypothesis of no treatment effects can be understood to be a hypothesis of stationarity of the time series . In particular, the researcher is interested in whether the distribution of is the same as that of , for which Andrews (2003)’s test is well-suited. In the simple case that we consider where , one rejects the null if belongs to the extreme tails of the empirical distribution of . We conducted Monte Carlo simulations for all the DGPs considered in the previous section, and verified that Andrews (2003)’s test suffered no size distortion.12 Andrews (2003)’s test is geared for application in time series, and as such, robust to certain heteroscedasticity. If the variances of in Equation (3) were different across js, most of the available methods exploiting cross sectional variation may need to be used with caution, as noted by Ferman and Pinto (2017). Andrews (2003)’s end-of-sample instability test being a test of stationarity of , its validity does not depend on whether the ’s have identical variances or not. The usefulness of Andrews (2003)’s test in this context was recognized earlier by Ferman and Pinto (2017).
Andrews (2003)’s test utilizes time series variation seriously. When is relatively small, perhaps the researcher would like to have a procedure that is based on cross sectional variation. If the factor structure is taken seriously and if the number of factors is a priori known, we can produce such a procedure by combining the ideas in Conley and Taber (2011) and Holtz-Eakin et al. (1988). For simplicity, assume that the model is given by
where we normalize . Let and . This is a case where , and . We then have
Under strict exogeneity assumption on x’s, we can consistently estimate as by using the control group. Now, assume that are i.i.d., which would imply
are i.i.d. A simple modification of Conley and Taber (2011)’s argument establishes that the distribution of can be consistently estimated by the empirical distribution of
where denotes Holtz-Eakin et al. (1988)’s estimator. Therefore, in order to test that , it suffices to consider a test that rejects whenever
is in the extreme tails of such empirical distribution. Ahn et al. (2013), for example, discussed how Holtz-Eakin et al. (1988)’s method can be generalized when there are multiple factors. The idea of combining Holtz-Eakin et al. (1988) with Conley and Taber (2011), although straightforward, does not seem to have been considered elsewhere.
We have considered two alternative methods of inference, one based on asymptotics, and the other based on asymptotics. In addition to these two methods, we can also entertain the possibility that if both and J are large, it may be possible to use the panel technique as in Bai (2009) as well.13 See, e.g., Gobillon and Magnac (2016). The latter two procedures are based on the presumption that the researcher takes the linear factor structure seriously, so it may be more powerful than the Andrews (2003)’s test. On the other hand, if a researcher views the linear factor model as just a toy model14 to illustrate the potential problem of difference-in-differences methods, then she would probably be hesitant to discard the synthetic control method, which may be able to accommodate potentially complicated statistical structures that may go beyond the linear factor model.
The three methods that we discussed here as possible alternatives are all theoretically valid under some asymptotics. Asymptotic validity does not necessarily imply that any given method performs reasonably for a given finite sample. A serious Monte Carlo comparison of the relative performance of the three alternatives, which is beyond the scope of the current paper, is required to determine a method to be recommended to practitioners.
5. Conclusions
We considered the performance of the permutation test (placebo test) in the context of the synthetic control method. The symmetry assumption, one of the crucial conditions for the validity of the permutation test, may be violated in synthetic control studies. Using Monte Carlo simulations, we show that the size of the permutation tests can be distorted. The results suggest that even with simple DGPs and rather restrictive distributional assumptions of the error term, as long as aggregate shocks are present, the permutation test in its current form is likely to fail and cannot serve as a proper tool for inference with the synthetic control method. Several possible alternatives were discussed. That being said, we should be careful and repeat an anonymous referee’s cautious remark that, while our analysis is from a repeated sampling perspective, the synthetic control literature uses permutation tests in the context of design-based inference, and as such, the permutation tests have exact size.
Acknowledgments
Helpful comments by Alberto Abadie, Bruno Ferman and Guido Imbens are greatly appreciated.
Author Contributions
The authors contributed equally to this work.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2010. Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program. Journal of the American Statistical Association 105: 493–505. [Google Scholar] [CrossRef]
- Abadie, Alberto, and Javier Gardeazabal. 2003. The Economic Costs of Conflict: A Case Study of the Basque Country. American Economic Review 93: 112–32. [Google Scholar] [CrossRef]
- Ahn, Seung C., Young H. Lee, and Peter Schmidt. 2013. Panel Data Models with Multiple Time-Varying Individual Effects. Journal of Econometrics 174: 1–14. [Google Scholar] [CrossRef]
- Andrews, Donald W. K. 2003. End-of-Sample Instability Tests. Econometrica 71: 1661–94. [Google Scholar] [CrossRef]
- Bai, Jushan. 2009. Panel Data Models With Interactive Fixed Effects. Econometrica 77: 1229–79. [Google Scholar]
- Bertrand, Marianne, Esther Duflo, and Sendhil Mullainathan. 2004. How Much Should We Trust Differences-in-Differences Estimates? Quarterly Journal of Economics 19: 249–75. [Google Scholar] [CrossRef]
- Canay, Ivan A., Joseph P. Romano, and Azeem M. Shaikh. 2017. Randomization Tests under an Approximate Symmetry Assumption. Econometrica 85: 1013–1030. [Google Scholar] [CrossRef]
- Conley, Timothy G., and Christopher R. Taber. 2011. Inference with “Difference in Differences” with a Small Number of Policy Changes. Review of Economics and Statistics 93: 113–25. [Google Scholar] [CrossRef]
- Doudchenko, Nikolay, and Guido W. Imbens. 2016. Balancing, Regression, Difference-in-Differences and Synthetic Control Methods: A Synthesis. NBER Working Paper No. 22791. Cambridge, MA, USA: National Bureau of Economic Research. [Google Scholar]
- Ferman, Bruno, and Cristine Pinto. 2017. Revisiting the Synthetic Control Estimator. New York: Mimeo. [Google Scholar]
- Fisher, Ronald Aylmer. 1949. The Design of Experiments, 5th ed. Edinburgh: Oliver and Boyd. [Google Scholar]
- Gobillon, Laurent, and Thierry Magnac. 2016. Regional Policy Evaluation: Interactive Fixed Effects and Synthetic Controls. Review of Economics and Statistics 98: 535–51. [Google Scholar] [CrossRef]
- Hoeffding, Wassily. 1952. The Large-Sample Power of Tests Based on Permutation of Observations. Annals of Mathematical Statistics 23: 169–92. [Google Scholar] [CrossRef]
- Holtz-Eakin, Douglas, Whitney Newey, and Harvey S. Rosen. 1988. Estimating Vector Autoregressions with Panel Data. Econometrica 56: 1371–95. [Google Scholar] [CrossRef]
- Nolan, John. 2015. Stable Distributions—Models for Heavy Tailed Data. Boston: Birkhauser. [Google Scholar]
- Wikipedia Contributors. Stable Distribution. Wikipedia, The Free Encyclopedia. Available online: https://en.wikipedia.org/w/index.php?title=Stable_distribution&oldid=808375411 (accessed on 10 November 2017).
| 1. | Doudchenko and Imbens (2016) also consider a slightly more general requirement . This is a sensible way to enhance accuracy of synthetic control viewed as a point estimator. It also provides a link to the difference-in-differences estimator. Because our focus is on inferential aspects of the problem, we simplify notation and analysis by abstracting away from the intercept term. |
| 2. | Using the notation consistent with this paper, Equation (1) in Abadie et al. (2010) takes the form , so the factor structure in Equation (3) of this paper is a special case of Equation (1) in Abadie et al. (2010), where , and , i.e., it is a special case where the does not exist and the first element of is time invariant. |
| 3. | Conley and Taber (2011), who proposed a similar test, cite Bertrand et al. (2004) when they discuss placebo tests. Abadie et al. (2010) reference many other papers that precede Bertrand et al. (2004). |
| 4. | The same test was first discussed by Hoeffding (1952), which is a generalization of the randomization test proposed by Fisher (1949). |
| 5. | We are using the fact that the symmetry implies the equality of marginal distributions, and therefore, the lack of equality of marginal distributions is a sufficient condition for violation of symmetry. |
| 6. | See Nolan (2015), or Wikipedia contributors (2017). |
| 7. | Abadie et al. (2010) also impose the positivity restriction, i.e., for all J. |
| 8. | It is straightforward to prove that under stationarity assumption, the only model that allows the synthetic controls to trace the trajectory of the outcome for the treated (i.e., for some ) is a linear factor model with . |
| 9. | See Ferman and Pinto (2017) for related discussion on the bias of the synthetic control estimator. |
| 10. | We set in Table 2. We also considered the case where . Although the results for this case are not reported here in the paper, they were qualitatively similar to the case. They are available upon request. (When the adding-up constraint was imposed, the two cases gave the same results. Without the adding-up constraint, these two specifications give slightly different results.) |
| 11. | This can be done by assuming that and . |
| 12. | The results are available upon request. |
| 13. | If one were to assume that , the factor model in Equation (3) becomes
Using the pre-treatment data, one can consistently estimate () and as long as . Using the control outcome for the period along with consistently estimated, one can consistently estimate , which is possible if . Combining as well as , one can make an inference of . |
| 14. | Indeed, Abadie et al. (2010) (Section 2.2) consider some other model (in addition to the factor model) for motivation of the synthetic control. |
© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).