Synthetic Control and Inference

We examine properties of permutation tests in the context of synthetic control. Permutation tests are frequently used methods of inference for synthetic control when the number of potential control units is small. We analyze the permutation tests from a repeated sampling perspective and show that the size of permutation tests may be distorted. Several alternative methods are discussed.


Introduction
Synthetic control method, proposed and discussed by Abadie and Gardeazabal (2003) and Abadie et al. (2010), is a very useful way of conducting comparative studies when exact matches are unavailable.Estimation of treatment effects usually takes the form of comparing outcomes between the treated unit and the control unit.Common sense suggests that, for the comparison to be meaningful, the control unit needs to be similar to the treated unit in the absence of the treatment in various dimensions.Such a requirement may not be satisfied in many observational studies.In some cases, availability of panel data makes such comparisons reasonable, the difference-in-differences method being a very well-known example.The difference-in-differences method requires a very specific set of assumptions, i.e., the common trend assumption, which may not be plausible for many applications.The synthetic control method offers a sensible generalization of the difference-in-differences. The synthetic control is a linear combination of the potential control outcomes, where the weights are manufactured by analyzing the pre-intervention outcomes.
For the purpose of statistical inference with synthetic control, i.e., confidence interval and hypothesis testing, various versions of placebo tests are often adopted.The idea underlying the placebo tests is the usual permutation tests, where the critical value of a test statistic is computed under all possible permutations of the "treatment" assignments in the control units.
The idea of permutation test is very intuitive and attractive.Applying the synthetic control method to every potential control unit presumably allows researchers to assess the distribution of a test statistic under the null hypothesis of no treatment effects, and the inference is seemingly exact in the sense that the burden of asymptotic approximation can be obviated.
The purpose of this paper is very specific.We ask whether the permutation test is a reasonable idea in the context of the synthetic control method, and argue that the intuitive appeal of the permutation test is misplaced.The validity of permutation tests usually requires certain symmetry assumption, which is often violated in the context of synthetic control studies.Using Monte Carlo simulations, we document the size distortion of the permutation tests.We also discuss a few alternative methods of inference.
Alberto Abadie kindly pointed out that the placebo test in synthetic control is often based on randomization inference idea, under which the symmetry restriction is built-in, while our analysis is predicated from the usual random sampling perspective, which leads to the violation of symmetry.This perspective is shared with an anonymous referee, who notes that (i) the synthetic control literature uses permutation tests in the context of design-based inference, and, as such, the permutation tests have exact size; (ii) the present article shows that permutation tests may not have correct size under a different mode of inference based on repeated sampling, although interpreting the permutation tests in the previous literature as tests based on repeated sampling would be incorrect; and (iii) the present article also proposes some alternatives that are valid in a repeated sampling setting.It would be useful to understand the exact mechanism through which the difference between the two perspectives manifests itself.The same referee points out that the present paper adopts a setting where T 0 → ∞, while the original litereature assumes fixed T 0 .

Placebo Test and Synthetic Control
In this section, we provide a brief discussion of the placebo test in the context of the synthetic control method.We begin with an overview of the synthetic control, borrowing heavily from discussions in Abadie et al. (2010) and Doudchenko and Imbens (2016).We then move on to describe the placebo test, and point out the importance of the symmetry assumption.We argue that the symmetry assumption is violated in general for placebo tests using linear combinations of outcomes, such as synthetic control.We conclude this section that such violation should be expected in general even when a normalized version of the test statistic is adopted.
We start with the overview of the synthetic control method.Consider a panel data with J + 1 cross sectional units observed over the time periods t = 1, . . ., T. Units j = 1, . . ., J are the control units that receive the treatment in none of the time periods.The unit j = 0 receives no treatment in periods 1, . . ., T 0 , and receives active treatment in time periods t = T 0 + 1, . . ., T. For simplicity, we will often assume that T = T 0 + 1.The outcome variable Y j,t is such that Y j,t = Y j,t (1) if the jth unit receives treatment in time t, and Y j,t = Y j,t (0) otherwise.Obviously, The idea underlying the synthetic control is that if there were some weights 1 ω1 , . . ., ωJ such that during the pre-intervention periods (t = 1, . . ., T 0 ).Then, ∑ J j=1 ωj Y j,t can be used as a (synthetic) control for Y 0,t during the post-intervention periods (t = T 0 + 1, . . ., T). Abadie et al. (2010) and Doudchenko and Imbens (2016) discuss various methods of finding the ω's so that the requirement in Equation ( 1) is satisfied.We analyze the weights and the nature of approximation from the asymptotic 1 Doudchenko and Imbens (2016) also consider a slightly more general requirement Y 0,t ≈ α + ∑ J j=1 w j Y j,t .This is a sensible way to enhance accuracy of synthetic control viewed as a point estimator.It also provides a link to the difference-in-differences estimator.Because our focus is on inferential aspects of the problem, we simplify notation and analysis by abstracting away from the intercept term.perspective where T 0 → ∞.Note that a special case of the estimator discussed by Abadie et al. (2010, p. 496) (2) Under our interpretation, Ȳj = T −1 0 ∑ T 0 t=1 Y j,t above is an estimator of E Y j,t .Consider the linear factor structure 2 as in Abadie et al. (2010): (3) Suppose that θ t , δ t , and j,t satisfy strict stationarity.Without loss of generality, we also assume that E [δ t ] = 0 and E j,t = 0. We would have Ȳj → α j + E [θ t ] in probability as T 0 → ∞.Assuming that ω * j satisfies we can understand that the population version of the synthetic control 0) is designed to have a mean zero.Our T 0 → ∞ asymptotic interpretation is not the only possible one.Doudchenko and Imbens (2016) provide an in-depth analysis of many possible methods.Our interpretation, however, is helpful for two reasons.First, it makes a concrete interpretation of ωs as estimates of some pseudo-parameter, say ω * 's, along with analytic expressions of the ω * 's, which makes it easy to understand the potential pitfalls of permutation methods afterwards.Second, it helps us to motivate alternative methods of inference exploiting time series variation.
We now discuss how placebo tests can be used in the context of synthetic control.For this purpose, we first present a summary of the placebo tests/permutation tests.The tests are motivated to deal with the case where the number of the treated is small and the number of controls is relatively large.In order to focus on the salient feature of the tests, we will consider an extreme case and assume that there is only one treated unit.
The basic intuition underlying the general placebo test can be gleaned by examining a standard textbook case of randomized treatments.Suppose that there is cross sectional data with J + 1 units, where the units j = 1, . . ., J are the control units and the unit j = 0 receives the active treatment.A reasonable estimator of the treatment effect is the difference Y 0 − Ȳ, where Y 0 is the outcome of the unit j = 0, and Ȳ = J −1 ∑ J j=1 Y j denotes the average of the outcomes of the controls.Suppose that we are interested in testing whether the treatment had impact.Given that there is only one treated unit, the standard t-test comparing the difference of the mean outcomes is not applicable.On the other hand, common sense suggests that we may implement such a test by "assigning" each control unit to fictitious treatment.More precisely, one can estimate the empirical distribution of Y k − (J − 1) −1 ∑ j =k Y j for k = 1, . . ., J, and use it as if it were the distribution of the treatment effect under the null hypothesis. 3 Implementation of the placebo test with synthetic control requires a bit more notation.First let ω = ω1 , . . ., ωJ denote the estimator of ω * = ω * 1 , . . ., ω * J .Although we will use the method of exact balancing later in our Monte Carlo simulations, we do not need to restrict ourselves to this particular estimator.For now, we can view ω as an output from a blackbox and let ω * denote its probability limit as T 0 → ∞.Second, let ω(−k) denote the outcome of the same blackbox except that we use the kth unit as the outcome of the treated unit, and Y j,t with j = k as our control units.The placebo test then uses the empirical distribution of Y k,T 0 +1 − ∑ j =k ω(−k) j Y j,T 0 +1 for k = 1, . . ., J as if it were the distribution of the treatment effect under the null hypothesis of no treatment effect.If the estimated effect Y 0,T 0 +1 − ∑ J j=1 ωj Y j,T 0 +1 belongs to the extreme tails of the empirical distribution, it is understood to be the evidence that the null hypothesis is incorrect.
In order to understand the size property of the placebo test, it helps to recall that the placebo test is a version of the permutation test, which requires for its validity what may be called the symmetry assumption.For review of this property, we will borrow the short discussion in Canay et al. (2017). 4Suppose that a researcher observes a vector of observations X, whose joint distribution is P. The objective is to test whether P ∈ P 0 , where P 0 is a collection of probability distributions such that the distribution of X is equal to that of gX for every g in G, where G is a finite collection of transformations.The permutation test has the exact size if, for the test statistic T (X), the critical value is taken from the distribution of T (gX) for every g in G.In the context of the placebo test above, one can understand X to be the vector Y 1 , . . ., Y J , and gX to be the permutation of the Ys.
We note that the symmetry is not mathematically obvious in the context of synthetic control.In order for the permutation test to be valid, it is necessary for the distribution of Y 0,T 0 +1 − . ., J to be identical.Even for the relatively simple model in Equation ( 3), the nature of the synthetic control is such that the symmetry does not naturally follow.Using the restriction in Equation ( 4), we may write Even if the first two terms on the right-hand side of Equation ( 5) were identically equal to zero over the permutations, we believe that the third term is not likely to satisfy the symmetry property.This is because we believe that under the further restriction that the 's have a finite variance, the term can be symmetric only when they are normally distributed.
We show that normality is necessary if the distribution of the error term 0,T − ω * T in Equation ( 5), where T = 1,T , . . ., J,T , is to be symmetric up to normalization.5 Suppose that 0,T , . . ., J,T are i.i.d., and their common distribution is such that the variance is finite and the characteristic function does not disappear.If ω is a nontrivial function of αs and γs, then symmetry over the permutations requires that the marginal distributions of k,T − ∑ j =k ω (−k) j j,T for k = 0, . . ., J should remain invariant over all possible ω (−k) s.Without loss of generality, we can focus on the distribution of 0,T − ω T , and conclude that the symmetry requires that there exists a random variable Y such that the distribution of 0,T − ω T is the same as that of cY for some scalar c.Because the standard deviation of 0,T − ω T is proportional to √ 1 + ω ω, we may without loss of generality take c = √ 1 + ω ω.This implies that the distribution of ω T only depends on ω ω.In other words, for ω = ω such that ω ω = ω ω, the distribution of ω T is identical to that of ω T .In particular, let all components of ω be zero except for the first one.Then, the distribution of ω T is identical to that of ω T = ω1 1,T = √ ω ω 1,T .This implies that j,T should have a stable distribution. 6Because the only stable distribution with a finite variance is the normal distribution, we should conclude that normality is a necessary condition of the symmetry (up to normalization).Note that the third term in Equation ( 5) arises in an ideal situation where the weights ω do not need to be estimated and the first two terms completely disappear.Our analysis suggests that even if we normalize the third term by its standard deviation, the symmetry requires normal distribution.The necessity of normality assumption is about any linear combination so it applies a fortiori to synthetic control.

Monte Carlo
The discussion at the end of the previous section casts doubt on the placebo test, even for the simple case where the first two terms in Equation ( 5) can be ignored.In order to understand the roles that the first two terms may play, we adopt Monte Carlo simulations.We try to find data generating processes (DGPs hereafter) that generate a large amount of size distortions.This is helpful in understanding the potential problem of the placebo test from the uniformity perspective; after all, the mathematical definition of the "size" of a test is the maximum probability of rejection under the null, and here the null hypothesis is a composite hypothesis where the only requirement on the DGP is that the treatment effect is zero, which allows many possibilities on the terms in Equation ( 5).For this purpose, we found it most convenient to work with the first two terms in Equation ( 5), although we acknowledge that there may be other important sources of size distortion that we have not explored.Since the last paragraph of Section 2 showed that normalization does not abate the symmetry requirement, we examine the importance of the first two terms in Equation ( 5) using a more natural statistic.The version of the synthetic control that we use in the Monte Carlo is the method of exact balancing, the population version of which minimizes The method of exact balancing may not be an ideal version of the synthetic control, but it reflects a certain ambiguity in the method of synthetic control.In the factor model in Equation ( 3), it is impossible to find weights ω such that Y 0,t = ∑ J j=1 ω j Y j,t for every t = 1, . . ., T 0 , if T 0 is large enough, as long as j,t is continuously distributed.In other words, the condition (2) in Abadie et al. ( 2010) is incompatible with the factor model unless Var j,t = 0.The assumption Var j,t = 0 has at least two implications. 8First, the weights ω j can be estimated without error with sufficiently large T 0 .Second, the distribution of the permutation test would have the point mass at zero, and as such, there is no reason to conduct any test.Both implications are questionable.In any case, under the assumption Var j,t = 0, the weights can be estimated (without error) by the method of least squares that minimizes . If the assumption Var j,t = 0 is violated, the method of least squares would be subject to a version of measurement error problem; the true regressor there is 3), and the Y j,t plays the role of a regressor with measurement error j,t . 9Note that such a problem is avoided by the method of exact balancing.
We consider the method of exact balancing in this section not because it is necessarily an ideal version of the synthetic control, but because it is a convenient way of examining the impact of the first two terms in Equation ( 5).As mentioned at the beginning of this section, our analysis at the end of the previous section suggests that the placebo test may have a problem even when these two terms are dismissed, and the purpose of our Monte Carlo exercise is to focus on the potential impact of these two terms.It is straightforward to prove that under stationarity assumption, the only model that allows the synthetic controls to trace the trajectory of the outcome for the treated (i.e., Y 0,t = ∑ J j=1 ω j Y j,t for some ω) is a linear factor model with Var( j,t ) = 0. 9 See Ferman and Pinto (2017) for related discussion on the bias of the synthetic control estimator.
Note that the term B(i) is equal to 0 by design here, although it can be in principle different from 0 depending on the DGP and the estimator chosen.We speculate that the placebo test is used in the hope that (a) Y 0,T − ∑ J j=1 ωj Y j,T is dominated by the term D(i) above; (b) the four terms A, B(ii), C(ii) and D(ii) above, which reflect the noise of estimating ω * by ω, are ignorable; and (c) the two terms C(i) and D(i) more or less satisfy the symmetry property.
We argued in the previous section that the term D(i) is likely to violate the symmetry property.In order to assess the impacts of other terms, we consider the following variations in DGPs: 1.
Vary the values of α's such that (a) none of the components of ω * dominates; (b) only two of the elements are non-zero.

2.
Vary the values of γ's such that the unbalanced unobservable factors C(i) (a) disappear; and (b) are present.

3.
Vary T 0 such that the estimation errors in the weights are (a) prominent; and (b) negligible.
Combinations of the first two variations give us four different DGPs, shown as DGP No. 1 to No. 4 in Table 1.

DGP No. α's γ's Variations
We considered two versions of the placebo tests: the first one is what might be called a feasible version of the test.Formally, for j = 0, 1, . . ., J, let Y j be a T × 1 vector of outcomes for the jth control unit, let Y = (Y 1 , . . ., Y J ), and let Y −j be a T × (J − 1) matrix that deletes the jth column from Y.Then, Similar to Equation ( 6), define the leave-one-out synthetic control weights ω−j for the jth control unit as a solution to min where Ȳ−j is to delete the jth element from Ȳ.We likewise define the population counterpart ω * −j as a solution to min ω ω s.t.(i) α −j ω = α j , (ii) ω = 1.
For j = 1, . . ., J and k = j, let ω−j,k be the element in ω−j that corresponds to the kth control unit.In addition, define ω−j,j ≡ 0 for j = 1, . . ., J.Then, for j = 1, . . ., J, we can compute Let S (1) , . . ., S (J) be the order statistics of S(Y j , Y −j )'s.We reject The second test is an infeasible version of the test, which is identical to the first test, except that we use the true value of ω * , i.e., and we reject 2 .For each DGP, we try T 0 ∈ {40, 80, 400, 800}, J ∈ {20, 40, 80} and σ 2 = 0.1.For all designs, we set the level of the tests to be α = 10%, and the number of Monte Carlo runs to be 1000.
The results are summarized in Table 2. 10 We see size distortions in Table 2, especially DGP No. 2 and No. 4. The size distortion there cannot be attributed to the noise of estimating ω.First, the problem persists even as T 0 approaches unrealistically large values.Second, the size distortion is similar over the feasible and infeasible versions of the test.We suspect that the problem is a fundamental problem 10 We set θ t ∼ N (0, 1) in Table 2.We also considered the case where θ t = 0.Although the results for this case are not reported here in the paper, they were qualitatively similar to the θ t ∼ N (0, 1) case.They are available upon request. (When the adding-up constraint was imposed, the two cases gave the same results.Without the adding-up constraint, these two specifications give slightly different results.) that may have something to do with the violation of symmetry.(An anonymous referee pointed out that DGPs No. 2 and No. 4 cannot produce synthetic controls that approximate the trajectory of the outcome for the treated, and that synthetic controls should not be applied in those settings.)For this purpose, we revisit the decomposition in Equation (5) of Y 0,T 0 +1 (0) − ∑ J j=1 ω * j Y j,T 0 +1 (0), assuming that the first and second terms in the factor model in Equation (3) are not present:11 This implies that the variance of Y 0,T 0 +1 (0) − ∑ J j=1 ω * j Y j,T 0 +1 (0) can be written as under the assumptions of the DGPs, where Σ(δ t ) is the covariance matrix of the vector δ t .Likewise, the variances of the permutation statistics are Depending on the relative magnitudes of γ's, we can easily construct examples that violate the symmetry, such as DGPs No. 2 and No. 4. As of now, it is not clear to us whether there is another venue (other than the variation in the size of γ), which leads to a violation of the symmetry.

Possible Alternatives to Placebo Tests
If we take the time series asymptotics (T 0 → ∞) seriously, the problem can be avoided by using the same idea as in Andrews (2003).The hypothesis of no treatment effects can be understood to be a hypothesis of stationarity of the time series W t ≡ Y 0,t − ∑ J j=1 ω * j Y j,t .In particular, the researcher is interested in whether the distribution of W T 0 +1 , . . ., W T is the same as that of W 1 , . . ., W T 0 , for which Andrews (2003)'s test is well-suited.In the simple case that we consider where T = T 0 + 1, one rejects the null if W T 0 +1 belongs to the extreme tails of the empirical distribution of W 1 , . . ., W T 0 .We conducted Monte Carlo simulations for all the DGPs considered in the previous section, and verified that Andrews (2003)'s test suffered no size distortion. 12Andrews (2003)'s test is geared for application in time series, and as such, robust to certain heteroscedasticity.If the variances of j,t in Equation (3) were different across js, most of the available methods exploiting cross sectional variation may need to be used with caution, as noted by Ferman and Pinto (2017).Andrews (2003)'s end-of-sample instability test being a test of stationarity of Y 0,t − ∑ J j=1 ω * j Y j,t , its validity does not depend on whether the j,t 's have identical variances or not.The usefulness of Andrews (2003)'s test in this context was recognized earlier by Ferman and Pinto (2017).
Under strict exogeneity assumption on x's, we can consistently estimate (β, δ 2 ) as J → ∞ by using the control group.Now, assume that j,1 , j,2 j = 0, 1, 2, . . .are i.i.d., which would imply j,2 Conley and Taber (2011)'s argument establishes that the distribution of j,2 δ 2 − j,1 can be consistently estimated by the empirical distribution of where β, δ 2 denotes Holtz-Eakin et al. ( 1988)'s estimator.Therefore, in order to test that η = η, it suffices to consider a test that rejects whenever  (1988) with Conley and Taber (2011), although straightforward, does not seem to have been considered elsewhere.
We have considered two alternative methods of inference, one based on T 0 → ∞ asymptotics, and the other based on J → ∞ asymptotics.In addition to these two methods, we can also entertain the possibility that if both T 0 and J are large, it may be possible to use the panel technique as in Bai (2009) as well. 13See, e.g., Gobillon and Magnac (2016).The latter two procedures are based on the presumption that the researcher takes the linear factor structure seriously, so it may be more powerful than the Andrews (2003)'s test.On the other hand, if a researcher views the linear factor model as just a toy model 14 to illustrate the potential problem of difference-in-differences methods, then she would probably be hesitant to discard the synthetic control method, which may be able to accommodate potentially complicated statistical structures that may go beyond the linear factor model.
reasonably for a given finite sample.A serious Monte Carlo comparison of the relative performance of the three alternatives, which is beyond the scope of the current paper, is required to determine a method to be recommended to practitioners.

Conclusions
We considered the performance of the permutation test (placebo test) in the context of the synthetic control method.The symmetry assumption, one of the crucial conditions for the validity of the permutation test, may be violated in synthetic control studies.Using Monte Carlo simulations, we show that the size of the permutation tests can be distorted.The results suggest that even with simple DGPs and rather restrictive distributional assumptions of the error term, as long as aggregate shocks are present, the permutation test in its current form is likely to fail and cannot serve as a proper tool for inference with the synthetic control method.Several possible alternatives were discussed.That being said, we should be careful and repeat an anonymous referee's cautious remark that, while our analysis is from a repeated sampling perspective, the synthetic control literature uses permutation tests in the context of design-based inference, and as such, the permutation tests have exact size.
1 β is in the extreme tails of such empirical distribution.Ahn et al. (2013), for example, discussed how Holtz-Eakin et al. (1988)'s method can be generalized when there are multiple factors.The idea of combining Holtz-Eakin et al.

Table 1 .
Data Generating Processes (DGPs) that generate size distortion.

Table 2 .
Null rejection rates of permutation tests.
Our Monte Carlo analysis indicates that the placebo test does have the size distortion problem.The results in Table2suggest that the size problem is potentially bigger in DGPs No. 2 and No. 4. DGPs No. 2 and No. 4 differ from No. 1 and No. 3 in that the γ's are nonzero and the aggregate shock δ t plays a role as a consequence.Therefore, it is of interest to investigate further sources of asymmetry.