1. Introduction
It is almost always the case in applied panel data research that some values will be missing, leading to unbalanced panels. Missing observations can happen for various reasons. In macroeconomics, the available data do not start at the same date for all countries, or the frequency of data collection for many variables changed over time; for example, where data were available on an annual basis, now they are also available every quarter, and therefore there are missing values at the quarter dates for the time that only annual observations were available. Firm and bank-level data are plagued by mergers and bankruptcies or the introduction of new banks and firms in the sample. “Windsorising” the data (trimming the outliers), which is standard practice in corporate finance, also creates missing values. In household survey data, many households drop out with time. In financial data, there are missing values on certain days, for example, holidays and weekends.
This paper examines the problem of missing values in panel data unit root testing. Missing observations were first studied in a time series framework with stationary data. Many of the early contributions can be found in
Harvey (
1989), but the first was by
Savin and White (
1978), which examined the
Durbin and Watson (
1950,
1951,
1971) test for serial correlation. Their main result was that ignoring the missing values by closing up the observations leads to a Durbin–Watson statistic that has the same null limiting distribution, and the bounds critical values are still valid.
Bhargava (
1989) examined the impact of missing values on the power of the Durbin–Watson test using an approximation of the power function and found that in the presence of an intercept in the model, and without very large gaps in the data, it is still reasonable to use the test. For non-stationary data, the first contributions were those of
Shin and Sarkar (
1994a,
1994b), which examine the impact of missing values on the instrumental variable unit root tests of
Hall (
1989) and the Dickey–Fuller test of
Dickey and Fuller (
1979).
Shin and Sarkar (
1994b) find that under the null hypothesis of a unit root, the estimator and t-statistics have the same distribution as in the non-missing data case. However, the sampling pattern does affect the power of the tests.
Ryan and Giles (
1998) re-examine the two schemes for dealing with missing observations in Dickey–Fuller unit root tests found in
Shin and Sarkar (
1994b), and propose a third one. The first scheme simply removes the gaps from the series and assumes that the existing observations are continuous. The second scheme replaces the missing values with the last recorded observation before the gap, and the third scheme uses linear interpolation to fill in the missing data by taking the average of two observations, the last one before the gap and the first one after the gap. They first confirm the findings of
Shin and Sarkar (
1994b) that the first two schemes leave the unit root test null distributions unchanged but show that the third scheme introduces additional terms in the limiting distribution. Furthermore, using extensive Monte Carlo simulations they show that the first scheme delivers the highest power, while linear interpolation provides some empirical size gains but significant power loss. However, the second scheme leads to more power in the augmented Dickey–Fuller test which deals with serial correlation (
Dickey and Fuller 1981).
The first contribution of this paper is to extend the popular panel unit root tests of
Harris and Tzavalis (
1999),
Karavias and Tzavalis (
2014) to unbalanced panels. This is the first paper that considers how to adjust panel data unit root tests to unbalanced panels. It is not straightforward to adapt the single time series results in panels because most panel unit root tests require bias corrections which will be affected by the pattern and location of the missing values, see, e.g.,
Levin et al. (
2002),
Im et al. (
2003), among others. The problem is even more serious when the bias correction is estimated by simulations.
The tests of
Harris and Tzavalis (
1999),
Karavias and Tzavalis (
2014) are fixed-
T tests with a wide range of deterministic component specifications including, individual unit intercepts, linear trends and common structural breaks. This allows us to study the impact of missing values on various settings. We focus on panel unit root tests with a large number of cross-section units
N, and a small number of time series observations
T because this was the original dynamic panel data framework introduced by
Holtz-Eakin et al. (
1988) and the framework of the first panel data unit root test, that of
Breitung and Meyer (
1994). It is also one of the most common in terms of applications, see, e.g.,
Karavias et al. (
2021). The panel data unit root tests of
Harris and Tzavalis (
1999),
Karavias and Tzavalis (
2014) are popular in applied research and have been implemented in statistical software.
1 They have several advantages beyond being applicable to short panels; they are invariant to the initial conditions, they allow for flexible and general trend functions, and they allow for cross-section heteroskedasticity.
We adjust the above test statistics for missing values by providing new bias correction formulas and deriving their asymptotic limiting distributions. The adjustment allows for general patterns of missing values that can differ across individuals and leads to excellent test size properties. Under the null hypothesis, the distribution of the adjusted test statistics remains identical to the case without missing values. This result holds for any missing-value correction scheme, unlike the single time series results of
Ryan and Giles (
1998).
The paper’s second contribution is to employ the
Harris and Tzavalis (
1999),
Karavias and Tzavalis (
2014) tests and the fixed-
T framework to study which method for dealing with missing data results in tests with greater power. The power properties of these tests have been studied previously in
Madsen (
2010) and
Karavias and Tzavalis (
2016,
2017). In this paper, we extend the local power functions of
Karavias and Tzavalis (
2016,
2017) to allow for missing values, and then use these functions to theoretically compare the three missing value schemes from
Ryan and Giles (
1998). The results show that the zeroing-out scheme leads to the greatest power for all types of deterministic components. This is also the first paper that examines the nexus between missing values and structural breaks. We find that zeroing-out once more leads to greater power, but the ranking of the other two schemes depends on the relative locations of the missing values and the structural breaks.
The paper is structured as follows.
Section 2 presents the tests of
Harris and Tzavalis (
1999);
Karavias and Tzavalis (
2014).
Section 3 introduces missing values and describes how they can be analysed in the fixed-
T framework.
Section 4 presents the missing-value-adjusted statistics and their limiting distributions under the null and under local alternatives.
Section 5 compares the impact of popular schemes for dealing with missing values, for various deterministic specifications.
Section 6 concludes the paper.
2. Panel Unit Root Tests without Missing Values
Assume that there are
N cross-section units and
T time series observations and consider the following data generating processes:
for
and
. For notational convenience, we further assume that the initial observation is
and it is observed resulting in a total of
time series observations per unit.
Model (
1) is includes individual (or incidental) intercepts and model (2) includes individual intercepts and individual trends. The models in (3) and (4) consider a single structural break in the intercepts and trends of the series, at time
. The break is assumed to be common for all units as in
Bai (
2010). The parameters
and
are the intercept and trend individual effects before the break and
and
are those after the break. The first two models have been considered in
Harris and Tzavalis (
1999), while models (3) and (4) have been considered in
Karavias and Tzavalis (
2014).
The error term
is assumed to be an autoregressive process of order one, as follows:
for
and
. The key parameter of interest is the autoregressive parameter
, which determines the stationarity of the panel process.
For models (
1) and (2) the null hypothesis of non–stationarity is given by
, while the alternative of stationarity is
. For models (3) and (4) the null hypothesis depends on whether there is a structural break under the null or not. Both choices are considered in
Karavias and Tzavalis (
2014) and could have been considered here, however, the results in terms of missing values do not change qualitatively and in the following we will only consider the case where a structural break occurs only under the alternative. Explicitly stated, the null hypothesis is
for (3), and
for (4). We will further assume that the date of the break is known to the researcher, as the missing values analysis also does not change if the date of the break is unknown.
To remove the individual effects from
we employ the annihilator matrices
, where
corresponds to models (
1)–(4). We introduce the following notation: Let
be a
identity matrix,
e be a
vector of ones, and
. Moreover, let
and
be
vectors such that
and
if
and 0 otherwise, and let
and
be
vectors such that
and
if
and 0 otherwise. The vectors
and
are effectively “breaking” versions of
e and
.
is an annihilator matrix with the general formula
, and where
depends on the model. Define
,
,
and
. By premultiplying models (
1)–(4) with the corresponding
, the least-squares estimator of the transformed model is given by:
where
and
.
The estimator (
6) is inconsistent because it suffers from the well known Hurwicz-Nickell bias.
2 Harris and Tzavalis (
1999),
Karavias and Tzavalis (
2014) derive expressions for this bias when
and show that it depends on the deterministic component specification. Furthermore, they show that the bias can be estimated and
be bias-corrected. The following test statistic, and its asymptotic distribution, is then used for testing the null hypotheses:
3
where
is the bias correction and it is given by the probability limit of
.
Harris and Tzavalis (
1999),
Karavias and Tzavalis (
2014) provide explicit formulas for
and
, for models (
1)–(4). For (3) and (4), the expressions of
also depend on the date of the break. This dependence is suppressed in our notation because the date of the break does not have an impact on the theoretical results of the paper.
The first contribution is to examine how the above statistic and its limiting distribution change in the presence of missing values. Missing values are introduced in the next section.
3. Missing Values
So far we have assumed that there are
observations of
for every
. Let the data spawn from
and let there be a missing value at time
, where
. Under the null hypotheses that
for models (
1)–(2) and
for (3), and
for (4), (
1)–(4) imply the following data generating process:
where
. In matrix form, the above equations (presenting only the first, to save space) become:
When there is a missing value , the dynamic nature of the system results in appearing in two equations of the system, the and the . This means that one missing value plagues two equations of the system as can also be seen from the above representation.
There is a fundamental difference in the way that missing values are treated in a fixed-
T framework compared to single time series analysis. In single time series, it is assumed that the data generating process is based on the index
t, as in models (
1)–(4), but only a subset of these observations is available,
. A new pseudo series is created based on the index
, i.e.,
, where
and the estimation of
is based on
. This approach is reasonable when
T is asymptotic because the impact of the index change is asymptotically negligible. When
T is fixed as it is here, the idea is to work with a fixed set of equations, see, e.g.,
Hayashi (
2000, sec. 5.3). We will examine methods that keep the asymptotic distribution of the test the same, but we wish to find which method results in maximum power.
Define to be a deterministic matrix that reshuffles the data to deal with missing values in the unit i. Notice that allows for the pattern and number of missing values to differ across units. If there are no missing values, . If there is a missing value for unit i at time , then , which is how missing values are introduced into the model. Effectively, multiplies the missing value with 0 and we assume that the outcome of this algebra is 0. Because this is a dynamic model and two equations are affected by a missing value at time , it must also be . The rest of the diagonal elements of are equal to 1, while the off-diagonal elements are the ones performing the missing-value correction scheme.
The different schemes for dealing with missing values, see, e.g.,
Ryan and Giles (
1998), such as closing the gaps, using linear interpolation and using the last available observation, dictate different versions of
. In the following section, we will derive the asymptotic distribution of the statistic in (
7) and the asymptotic local power without assuming a specific type of
. This means that the analysis can be used for comparing other methods for dealing with missing values, beyond the ones in
Section 5.
4. Asymptotic Distribution and Local Power Function
The main idea behind the asymptotic analysis comes from
Hayashi (
2000, p. 338), and to demonstrate it we consider (
1), which can be quasi-differenced according to Equation (
5) and be written as
. Stacking the model across the time dimension, it becomes:
Then, we premultiply the model by
, which is the transformation matrix that deals with the missing values in each unit, and apply a second transformation, the within transformation, to remove the deterministic components. The removal of the individual effects is based on:
Notice how
now depends on the individual
i, since individuals are allowed to differ in terms of the number and location of missing values. The general expression of
is
. Then,
is the estimator which minimises the least squares criterion in the transformed model and it equal to:
The asymptotic analysis is based on the following set of assumptions. These are not the weakest assumptions possible, however, they are useful for allowing us to study the problem of missing values analytically.
Assumption 1 (Errors/No Selectivity Bias). (i) , for is a sequence of independent random vectors with and , where . (ii) follows a multivariate normal distribution.
Assumption 2 (Variable Independence). , for and is independent of , , , , , and , and .
Assumption 3 (Invertibility). For , with probability 1 for all possible break points, T and N.
Assumption 4 (Missing Values).
As ,where is positive definite. Assumption 1 is standard in the literature, see, e.g.,
Verbeek and Nijman (
1992). It implies that the errors are not correlated with the missing values, in other words there is no selectivity bias. Still, the individual effects can be correlated with missing observations. The assumption also implies that conditionally on the missing values, the errors are homoskedastic across both the time series and cross-section dimensions and not serially correlated. Cross-section dependence is not allowed, and finally, the second part imposes normality, which helps in simplify the formulas below. Assumption 2 is also standard and necessary for deriving the asymptotic local power function. Assumption 3 is an invertibility assumption that also restricts the presence of the structural breaks to locations that allow the existence of
and
. Effectively, it is a high-level assumption that implies the need for trimming the sample at the beginning and the end. For more information, see also
Karavias and Tzavalis (
2014). Assumption 4, is a regularity condition that allows the existence of the estimator bias in the presence of missing values.
We are now able to present the limiting distribution of in the presence of missing data:
Proposition 1. Let Assumptions 1, 3 and 4 hold. Furthermore, for models (3) and (4) let the date of the break be known. Then, under , and as : The quantities and are defined as:where is such that,and . Finally, Λ is a matrix with elements for and otherwise, for . Proposition 1 derives the asymptotic distribution of
under the null hypothesis and under a general pattern of missing value treatment. The proof of the proposition is based on
Harris and Tzavalis (
1999),
Karavias and Tzavalis (
2014) and is omitted. Expressions (
15) and (
16) make clear how the bias and variance of the estimator should be adjusted in the presence of missing values. The result in (
14) differs from those in
Harris and Tzavalis (
1999),
Karavias and Tzavalis (
2014) in that it contains the matrix
. Proposition 1 demonstrates that the limiting distribution of a statistic adjusted for the missing values remains the same as in the case without missing values. This finding is in line with part of the previous literature, see, e.g.,
Shin and Sarkar (
1994b). Unlike
Ryan and Giles (
1998) however, the limiting distribution does not change in the case of linear interpolation, because the distribution does not depend on the type of
.
When the date of the break is unknown,
Karavias and Tzavalis (
2014) suggest computing the minimum of
for models
, over every permissible break date, as determined by Assumption 2. The limiting distribution in this case will depend on the correlations between the
for different break dates, which will be affected by the missing values. Therefore, the critical values of
Karavias and Tzavalis (
2014) are no longer valid in this case. Instead, one can use the bootstrap proposed in
Karavias and Tzavalis (
2019) to derive the appropriate critical values. The case of unknown breaks will not be pursued in the analysis below because the impact of missing values is qualitatively the same as in the case of known breaks.
To employ the statistic in (
14), it is necessary to a employ consistent estimator of
, which is given by:
and a consistent estimator for
, which is given by:
where
.
The following proposition examines the behaviour of the
statistics under local alternatives, as in
Karavias and Tzavalis (
2016,
2017). The main advantage of the local power theory is that it allows us to examine analytically the impact of each type of missing value correction, and this is more transparent than doing Monte Carlo simulations, where the results depend also on other parameters in the experimental design. The local power function is an approximation of the power function in a
neighbourhood of the null hypothesis. The local alternatives are defined as
, where
, because
N is the only increasing data dimension.
Proposition 2. Let Assumptions 1–4 hold. Furthermore, for models (3) and (4) let the date of the break be known. Then, under , and as : The quantity is given by:where and Θ is a matrix that has elements: if or , and . Proposition 2 states the limiting distribution of
under local alternatives. The proof of the proposition is based on
Karavias and Tzavalis (
2016,
2017) and is omitted. The result is elegant and states that the probability of rejecting the null when it is not true (
) is a monotonic function of
. It suffices therefore to examine the sign and magnitude of
; the larger the
, the more powerful the test.
means that the test has trivial power, while if
, the test is biased.
5. Dealing with Missing Values
In this section we evaluate
for various types of
to see which way of dealing with missing values results in greater test power. We will consider the three schemes which have previously appeared in the single time series literature (see, e.g.,
Ryan and Giles (
1998)): Closing the gaps, filling in the previous available value, and using linear interpolation.
With respect to interpolation, there are various methods can produce estimates of the missing values, but we follow
Ryan and Giles (
1998) and only use the average of the two observations before and after the missing values. There are other methods available, which ultimately introduce a bias-variance trade-off problem, but we leave this direction for future research.
The first option that we examine is zeroing-out the missing observations, which is equivalent to “closing the gaps”. The zeroing-out happens by creating
to be a diagonal matrix with diagonal elements
if observation
is missing, and equal to
for
. The second option in the literature is to substitute the missing value with the previous available value, i.e.,
. This is possible by defining the matrix
as:
Finally, the third option is to substitute the missing value with the average of
and
. This is possible by creating
as:
In unreported Monte Carlo simulations it is found that all three above transformations have size very close to the nominal.
4 It is therefore of interest which transformation results in higher power.
Table 1 below compares the three methods in terms of local asymptotic power, based on the theoretical results of (
21). We consider the case of a single missing value and where
, that is, the missing value appears in all units and at the same place. This is a simplifying assumption, but our interest is not on the power of tests per se, but rather in determining the relative power of the three ways of dealing with missing values. We allow the missing value to appear towards the beginning, the middle, and the end of the sample at times
and
. We further set
, and for the
Karavias and Tzavalis (
2014) we allow the dates of the breaks to take place at the same dates as the missing values.
Table 1 below reports
from Equation (
21), for
. A higher value of
means higher test power.
The results of
Table 1 demonstrate that the
Harris and Tzavalis (
1999) test performs best when the missing value is zeroed-out, and the next best method in terms of power is the linear interpolation. The worse method is substituting the missing value with the previous observation. This is the first result in the panel unit root test literature on the impact and treatment of missing values. The fact that zeroing-out produces the highest power agrees with the single time series results of
Ryan and Giles (
1998). However, they find that linear interpolation performs worst, which is contrary to what is found here, as linear interpolation outperforms the previous value substitution.
In the presence of structural breaks the results are less clear-cut. The zeroing-out method dominates the other two, for all combinations of structural break and missing value dates. However, the ranking between the other two methods depends on the relative location of the missing value and the date of the structural break. If the missing observation, or the next, coincides with the date of the structural break, then the substitution of the last available observation leads to higher power than linear interpolation. For the rest of the cases, linear interpolation leads to greater power.
The presence of linear trends in
results in
in (21). This is the known problem of trivial local power in the presence of incidental trends, see, e.g.,
Moon et al. (
2007) and
Karavias and Tzavalis (
2016,
2017). This result demonstrates that panel unit root tests do not have power in the presence of incidental trends in a neighbourhood of the null hypothesis. For the purposes of the analysis here, this means that the asymptotic local power functions cannot be used to show which method is the best. However, unreported Monte Carlo simulations show that for alternatives far from the null, the results for the case without trends still hold.
The results of
Table 1 allow for a comparison of the relative power of the three missing value methods. The absolute powers can be calculated from
, where
is the cumulative distribution function of the standard normal distribution, and
is the
-level percentile of said distribution. The absolute power gains of zeroing-out over the other methods are greater when
T is smaller,
c is closer to 0, and when the number of missing values is larger.
The above conclusions extend to settings with multiple missing values, and cases where the number and locations of missing values differ across units. To save space, we do not present these results but are available upon request. The
Harris and Tzavalis (
1999),
Karavias and Tzavalis (
2014) tests can accommodate cross-section dependence in the form of an additive time effect, however, that does not change the above analysis or its conclusions.