The Discovery of Long-Run Causal Order: A Preliminary Investigation †

† This paper arises out of a joint project with Søren Johansen, Katarina Juselius: they have been inspiring teachers, candid critics, and true friends. I am grateful for earlier collaboration with Morten Nyboe Tabor and for the skeptical, but invaluable, comments of the guest editors for the special issue, Paolo Paruolo and Rocco Mosconi, and three anonymous referees. A very early version of the paper was presented at the Econometrics Conference, Programme for Economic Modelling, University of Oxford, 1–2 September 2014. I thank the participants for valuable comments. Abstract: The relation between causal structure and cointegration and long-run weak exogeneity is explored using some ideas drawn from the literature on graphical causal modeling. It is assumed that the fundamental source of trending behavior is transmitted from exogenous (and typically latent) trending variables to a set of causally ordered variables that would not themselves display nonstationary behavior if the nonstationary exogenous causes were absent. The possibility of inferring the long-run causal structure among a set of time-series variables from an exhaustive examination of weak exogeneity in irreducibly cointegrated subsets of variables is explored and illustrated.

theory or practical institutional knowledge or common sense to pick among the equivalent causal orders. In fact, however, empirical evidence can be brought to bear on the choice. When the underlying data-generating processes (DGPs) are casually ordered in such a way that an empirically valid model of it would be over-identified, information about conditional dependence and independence among the variables in some cases will provide information that can be used to distinguish among possible causal orders. This approach has been developed with great sophistication (mainly for non-time-series data) in the so-called graphical-causal-modeling or Bayes-net literature (Spirtes et al. 2000;Pearl 2009). 3 Swanson and Granger (1997) first applied a simple graphical causal search algorithm to the problem of determining the contemporaneous causal structure of an SVAR. Subsequently, more sophisticated algorithms have been applied and shown to be effective in a wide range of circumstances (Demiralp and Hoover 2003;Demiralp et al. 2008; and references therein).
Meanwhile, time-series econometrics discovered the importance of nonstationary processes and the concept of cointegration (Engle and Granger 1987). In light of these developments, the SVAR was reformulated into the CVAR. Throughout the paper, we will consider cases in which we, in fact, know the true DGP, but observe only some part of it. To be clear, our operating assumption is that a complex data-generating process governs the behavior of the economy; and the aim of structural causal modeling is to uncover a (partial) representation of the true DGP that is adequate to pragmatically required levels of detail and precision to support inter alia prediction and counterfactual analysis. 4 A key question will be how much information about the DGP can be recovered from the observables.
Our interest is in long-run identification; so, we will restrict our attention to CVARs, taken to be a reduced form of a part of the economy's unobserved DGP, of the form: where X = [x 1 , x 2 , . . . x p ] is a vector of variables integrated of degree one (notated I(1)), Π is a p × p matrices of parameters; E = [ε 1 , ε 2 , . . . ε p ] is p-element vector of normal residuals distributed E t~N (0, Ω); and t subscripts indicate time. The residuals contain both unobserved causes, which we shall call "shocks," and various sorts of error. The matrix Ω is assumed to be diagonal. This assumption could be justified by economic theory or could result from orthogonalizing the residuals by multiplying through by a matrix that reflects the appropriate contemporaneous causal ordering in the manner that Choleski matrices are frequently used in the SVAR literature, a transformation that would affect the interpretation of the X t 's. If the variables in X are cointegrated (i.e., if a linear combination of nonstationary variables is itself stationary), then Π has reduced rank (r) and may be written as Π = αβ , where α and β are p × r matrices. Such a CVAR is said to have r cointegrating relations and q = p−r common trends. The rows of β contain the cointegrating vectors; while the α matrix contains adjustment parameters. In general, the αβ decomposition in not unique, since α and β may take different values, so long as Π = αβ and still remain consistent with the observations modeled in Equation (1) (Johansen 1995, p. 71;Juselius 2006, p. 216). Most of the focus in identifying the CVAR has been placed on identifying the cointegrating vectors of the β on the basis of prior economic theory.
The goal of this paper is to provide a coherent account of the causal order of a CVAR and to make some preliminary suggestions about how the methods of graphical causal search in conjunction with cointegration analysis could aid in the empirical discovery of its long run, as they have already aided in the discovery of the contemporaneous causal structure. 3 "Graphical" (or "graph-theoretic") causal modeling should be the preferred term, as the search methods do not require a Bayesian approach to statistics. For compact treatments of the approach and the basic algorithms, see Cooper (1999) and Demiralp and Hoover (2003). 4 On the general methodology of modeling in relation to the CVAR see Hoover et al. (2008) and Hoover and Juselius (2015).

Graph-Theoretic Causal Order
Where other investigators have mainly focused on the cointegrating relationships encapsulated in β , we shift the focus to the closely related question of how trends are transmitted among the variables. Ours will be a preliminary investigation and will be restricted to cases in which all variables are I(1) and DGPs that can be adequately represented in a structural model that can be understood as a causally ordered consistent with a directed acyclical graph.

Graphs and Causal Structure
Several econometricians have given structural accounts of long-run behavior in the CVAR. They have focused mainly on the use of theory to provide the necessary identification (Davidson and Hall l991;Pesaran and Shin 2002;Pesaran and Smith 1998;Pagan and Pesaran 2008). In contrast to economists' frequent reliance on a priori theory, in the case of stationary data, considerable headway has been made (mostly, but not entirely, outside of economics) in developing graphical causal search algorithms that can narrow the class of admissible identifications-sometimes to a unique scheme (Spirtes et al. 2000;Pearl 2009). As a preliminary to examining how some of these ideas might be extended to the nonstationary case, it will be helpful to review selectively some aspects of graphical causal analysis.
In Simon (1953) account, a structural model is a system of equations representing mechanisms in the world. 5 Although the account can be generalized considerably (see Hoover 1990;2001, chp. 3), it will do for our purposes to restrict our attention to linear equations and to treat each equation as the representation of the causal mechanism determining its left-hand-side variable (the effect) in terms of right-hand-side variables (the direct causes). The coefficients on the right-hand-side variables are taken to define the space of interventions in the causal model. Thus, an intervention, for example, to a policy rule might change the numerical value of one of the coefficients in the equation representing the rule. In a well-defined structure, the coefficients could be intervened upon independently of each other.
We analyze a restricted version of the structural approach to causality, in that it does not deal with nonlinearities, such as cross-equation restrictions, that might arise in economic optimization problems or from systemic restrictions, such as may be generated under rational expectations. In part this is a pragmatic choice to deal with the easier case first; in part, it is to maintain tighter contact with the existing graph-theoretic causal search literature; and, in part, it arises from a yet-untested conjecture that considerable empirical progress can be made with respect to long-run cause in a simple framework. The structural approach can nonetheless be further generalized; see, for example, (Hoover 1990, appendix;2001, especially chp. 3) and White and Chalak (2009).
Graph-theoretic causal analysis represents structural systems of equations as a directed graph. The variables form the nodes or vertices of the graph, and edges connect pairs of vertices. Edges come in several forms, but we will use only one-the single-headed arrow "→", which means "directly causes". Direct causes are also referred to as the parents of the effect or child. We restrict ourselves to directed acylical graphs (DAGs), which are adequate to the typical CVARs found in the macroeconomics literature. Graphical causal modeling is not, however, restricted to DAGs: the literature has also addressed cyclical graphs (for example, graphs in which A causes B, B causes C, and C causes A) and simultaneous graphs (a particularly tight form of cyclicality in which A causes B and B causes A) (see Richardson 1996;Phiromswad and Hoover 2013, and the references therein). 5 Hoover (1990;2001, chps. 2 and 3) provides a detailed account of Simon's approach and of it generalization to nonlinear systems, including ones with cross-equation restrictions among the parameters.

Graphs and Conditional Independence
The key idea in graph-theoretic accounts of causal structure is the mapping between the causal graph and the probability distribution described of the true DGP and its reduced form. The mapping is based on Reichenbach (1956, p. 156) Principle of the Common Cause: if any two variables, A and B, are probabilistically dependent, then either A causes B (A → B) or B causes A (A ← B) or they have a common cause (A ← C → B). Essentially, the idea behind the principle is that correlations may not be causation, but correlations nevertheless must have a causal explanation. The Principle of the Common Cause is generalized as the "causal Markov condition" (Spirtes et al. 2000, p. 29; see also Pearl 2009, p. 30).
Without going into detail, the graph encodes certain facts of (conditional) probabilistic dependence and independence among the variables. If the data were, in fact, generated by a system of equations corresponding to the graph-as they would be, for example, in a simulation-then the joint probability distribution for those variables would embody the encoded probabilistic relations.
Some key ideas relate graphs to probabilistic independence and dependence. One variable may be a common cause of others and the effects will be rendered probabilistically independent of each other after conditioning on the common cause. Similarly, variables may stand in chains; for example, A → C → B or A ← C ← B. In either case, as with the common cause (A ← C → B), conditioning on the intermediating variable C renders A and B probabilistically independent of each other. In all three cases, C is said to screen (or screen-off) A from B.
The translation of equations into graphs also generates another characteristic pattern of causal graphs. When two or more variables are causes of another variable, then several arrows will point into the effect variable. For example, A → C ← B graphs an equation in which A and B are the causes of C, and C is said to be a collider on the directed path between A and B. If A and B, conditional on their parents, are probabilistically independent and collide at C, they will be probabilistically dependent conditional on C. With stationary data, the presence of colliders helps to orient the arrows in a graph. As we shall see presently (Section 4.2), colliders are also important to the transmission of trends, as they represent points at which new local trends are generated.
A final useful concept from graphical causality is causal sufficiency: Definition 1. A set of variables is causally sufficient if, and only if, any variable that is excluded from the set directly causes at most one variable within the set (Spirtes et al. 2000, p. 22).
The point of invoking causal sufficiency is that the actual DGP of the economy is more complicated than any model of observable variables that an economist might analyze. When a set of variables is causally sufficient, the excluded variables are not common causes and do not induce probabilistic dependence among the observables, so that it is possible to analyze the subset of variables without loss of causal information. Clearly, causal sufficiency is a very special case that will rarely be strictly true for our models, but that sometimes might be approximately true. When it fails, we necessarily face a latent-variable problem.
Graph-theoretic search algorithms work backward from the data by systematically evaluating conditional dependence and independence relations for subsets of variables statistically and then deducing logically what graph or class of graphs or, equivalently, what econometric specifications could have generated those facts. 6 We investigate the possibility of employing a strategy that was developed for stationary data to infer long-run causal structure using facts about cointegration and weak exogeneity rather than facts of causal dependence and independence. 6 See Cooper (1999), Spirtes et al. (2000, chps. 5 and6), andPearl (2009, chp. 2). The Tetrad software package implements Spirtes et al. (2000) algorithms, as well as additional algorithms, and can be downloaded from Carnegie Mellon University's Tetrad Project website: http://www.phil.cmu.edu/tetrad/. The nonstationarity of the variables in a system of equations such as Equation (1) may arise in two ways. Consider two distinct DGPs. Assume that there are two sets of variables, X t and T t . The first corresponds to the graph Figure 1a-a simple chain: where the Ts are exogenous I(1) trends and the ε's and the η's are identically, independently distributed (i.i.d.) random shocks. The connection of DGP 1 to the CVAR of Section 2 will become clear presently.
where the Ts are exogenous I(1) trends (3) and the ε's and the η's are identically, independently distributed (i.i.d.) random shocks. The connection of DGP 1 to the CVAR of Section 2 will become clear presently.
where the ζs are i.i.d. random shocks.
DGP 1 shows the first of the two ways that variables may display stochastically trending behavior: T trends stochastically independently of the other variables in the system because of its fundamental random-walk structure and transmits that behavior to the Xs, i.e., if the trend (T) did not appear in Equation (2), which was otherwise unaltered (i.e., ФXX remaining the same), the system would not contain an autoregressive root of unity and the X's be stationary). Now suppose that the trends are latent in DGP 1, so that we observe only the Xs. To see what is implied for the cointegration of the Xs, we can solve out T to get a reduced form. The resulting system will have reduced rank (=2) and the cointegration space is spanned by two vectors given by 8 DGP 2 shows the second way that variables can stochastically trend: here, the X's trend, not because of an exogenous cause, but because of the fine-tuning of their structural coefficients (cf. Davidson and Hall 1991, p. 239). In particular, the parameters have been chosen specifically to give 8 In general, calculation of the cointegrating vector is the equivalent of solving out the Ts from the long-run representation of Equation (2) in which we set Xt and the error terms to zero; specifically the cointegrating vector is given as The orthogonal complement, indicated by the subscript is defined for a The second distinct DGP corresponds to the graph in Figure 1b: where the ζs are i.i.d. random shocks. DGP 1 shows the first of the two ways that variables may display stochastically trending behavior: T trends stochastically independently of the other variables in the system because of its fundamental random-walk structure and transmits that behavior to the Xs, i.e., if the trend (T) did not appear in Equation (2), which was otherwise unaltered (i.e., Φ XX remaining the same), the system would not contain an autoregressive root of unity and the X's be stationary). 7 This question is addressed from a philosophical point of view in Hoover (2015). Now suppose that the trends are latent in DGP 1, so that we observe only the Xs. To see what is implied for the cointegration of the Xs, we can solve out T to get a reduced form. The resulting system will have reduced rank (=2) and the cointegration space is spanned by two vectors given by 8 DGP 2 shows the second way that variables can stochastically trend: here, the X's trend, not because of an exogenous cause, but because of the fine-tuning of their structural coefficients (cf. Davidson and Hall l991,p. 239). In particular, the parameters have been chosen specifically to give DGP 2 the same cointegration properties as DGP 1. 9 It is important to understand that DGP 2 is not a reduced form of DGP 1. It is a distinct structural system that happens to have coefficients that give it the same cointegration properties as DGP 1. The fact that its variables display I(1) trends, reflects a system property of the model that cannot be reduced to the effect of any variables that would trend without the presence of the others. In contrast, in DGP 1, the Xs trend and are cointegrated because they are driven by the same exogenous I(1) trend; and that would be true whether or not the driving trend (T) were observed or latent.
The I(1) behavior of the variables in DGP 2 depends on the exact values of the elements of Π. It is fragile in the sense that a small change in one of the structural coefficients that does not reflect any change in the causal graph ( Figure 1b) can result in the loss of cointegration and of the trend behavior of the Xs. In contrast, DGP 1 is generic in the sense that it is robust to changes in the values of the structural coefficients (i.e., to changes that do not alter the causal graph (Figure 1a)).
To illustrate, suppose that the coefficients of DGP 2 are altered, such that the values of Π in Equation (4) are now where the bold entries indicate where Π has been altered. Now the rank(Π) is three, there is no cointegration among the variables, and, indeed, the previously nonstationary X t are now stationary. 10 In contrast, consider making changes of the same magnitude in the analogous part of the causal structure of DGP 1 in (2), so that where again the bold numerals indicate the alterations. Unlike the case of DGP 2, qualitatively, the cointegration properties remain unchanged-there is still only the one trend, T, in the system. Again, if we take the trend to be latent, then, while the precise values of the cointegrating relationships have changed, the cointegration rank (2) has not. The cointegrating vectors are now In general, calculation of the cointegrating vector is the equivalent of solving out the Ts from the long-run representation of Equation (2) in which we set ∆X t and the error terms to zero; specifically the cointegrating vector is given as Φ XT⊥ Φ XX The orthogonal complement, indicated by the subscript is defined for a full-rank p × r matrix A, as a p × (p − r) matrix A ⊥ , such that A ⊥ A = 0; see (Johansen 1995, p. 39). 9 Row 3 of Π in DGP 2 is simply the first cointegrating relation from the reduced form of DGP 1 when T is latent, while Row 2 is the second. Row 1 is (−0.01) × the first cointegrating relation + (−0.1) × the second. 10 The eigenvalues of I + Π are 0.70678 ± 0.16146i, and 0.98643. Cointegration in DGP 2 is fragile in the sense that only specific choices of coefficients produce a trend and cointegration, and small deviations from those values can destroy those properties. Cointegration in DGP 1 is generic in the sense that small deviations in coefficients, while they alter precise values of cointegrating relations, nonetheless preserve the cointegration rank (i.e., the number of trends). The generic nature of the cointegration properties of systems like (2) is the result of the trend behavior of the X's having an independent cause based in exogenous variables that are fundamentally I(1), while the fragility of cointegration in a DGP like (4) is the result of it arising only from the fine-tuning of the structural coefficients. Such fine-tuning could arise in specific cases for good economic reasons; however, in the spirit of Reichenbach's Principle of the Common Cause, we should assume that it would not be the general case, unless we can point to an economic explanation of why the structural coefficients take those specific values in a particular case. 11 It is unlikely that cointegration generally arises from a fortuitous combination of coefficients, which combined with the fact that we often find cointegration among the observable variables without any of them being weakly exogenous, suggests that the source of nonstationary behavior and cointegration among observable variables is more typically the result of latent I(1) trends.
In DGP 1, we can point to specific variables that are the source of the trends. In this case, we will say that the variables are driven by genuine (or real) fundamental trends, whether those trends are themselves observed or are latent. It is conventional in the CVAR literature to say that any system of I(1) variables with reduced-rank contains trends equal to the number of variables in the system less the number of cointegrating relations (the rank). These trends may generally be represented as the cumulation of the permanent shocks to the CVAR, which are backed out of the shocks to the Xs by imposing identifying assumptions (see Juselius 2006, chp. 15, especially Section 15). These representations are generally not uniquely identified, even when there are latent fundamental trends in the DGP and even when, as in DGP 2, there are no fundamental trends at all. In either case, we might call them "virtual trends," since they do not correspond to a particular variable-observable or latent.
Our working hypothesis is that trending behavior originates economically in a relatively small number of variables whose own natures are such that they are nonstationary; we call these "fundamental trends." The number of fundamental trends causally influencing a set of variables is equal to q (the number of variables (p) minus the rank of the Π matrix (r)). However, the fundamental trends themselves may or may not be among the observed variables. Other variables may be nonstationary, because these fundamental trends are among their direct or indirect causes; we call these "ordinary (nonstationary) variables". In most cases, it would seem that we observe only ordinary variables, and the ultimate source of their trending behavior is to be found among their latent causes.
It might be argued that DGP 1 is also fragile because a change of parameters that rendered any of Ts stationary would upset the cointegration properties of that model in the same way that those of DGP 2 are upset by a small change in coefficients. However, that would miss the essential point. Of course, if the exogenous Ts were not I(1), then there would be no trends to transmit. The argument here, however, is that, in a structural model, it is far more likely that the source of a trend is a particular I(1) variable-either observed or latent-than that the source would be a group of distinct structural equations that just happen to have the right coefficients to generate what very often are multiple I(1) trends. This is ultimately not an econometric argument, but an economic one-we can more easily think of good economic reasons that a single economic variable might be a random walk (or a random walk with a drift or a random walk with a deterministic trend) than we can think of good reasons that that the parameters of several equations are appropriately tuned. For example, common sense and experience suggest that it is highly unlikely that a small change in the relative weights that a central bank places on inflation and unemployment in its reaction function would fundamentally change the cointegration properties of a system of structural macroeconomic equations. If we do not observe such instability of the cointegration properties and we most often do not find observed exogenous I(1) variables, then it suggests that typical estimated CVARs are reduced forms and that we will have to dig deeper to discover the structure that lies behind them. Ultimately, this is an empirical hypothesis about whether CVARs based on structures like DGP 1 prove to be more economically informative than those based on structures like DGP 2. Our goal is to explore some of the implications of this hypothesis about of the typical origin of I(1) trends for the long-run causal structure of the world and for the possibilities of uncovering that structure (or, as least, parts of it) empirically. 12

Graphical Analysis of the CVAR
The DGP that adequately represents the long-run causal structure in the economy is not directly observable. But might it be inferred on the basis of data and not simply imposed as a priori restrictions on the CVAR? We begin by showing, first, how a DGP can be represented as a causal graph; and, second, how we can think of that graph as a map of the transmission of trends through the system of variables. We then want to investigate whether the facts of cointegration and weak exogeneity among subsets of observable variables might provide the necessary empirical data to allow us to recover reliable information about the underlying DGP, analogously to the way in which graphical causal search algorithms allow us to infer the causal structure of stationary data from empirical evidence about probabilistic dependence and independence among subsets of variables. The two critical tools are Davidson (1998) analysis of irreducibly cointegrating sets of variables and Johansen (2019) state-space analysis of the CVAR, which provides an instrument for analytically determining weak exogeneity among subsets of variables. These tools allow us to explore the logic of causal inference for nonstationary data. In Section 5, we demonstrate applications of that logic that suggest a possible basis for a causal search algorithm.

The Canonical CVAR of a Causally Sufficient, Acyclical Graph
Consider first the long-term structure of a causally sufficient CVAR with an acyclical causal structure in which the fundamental trends are represented explicitly. In the remainder of the paper, we consider only cases for a strong form acyclicality in which we do not permit any feedback from one variable to another, even with a time delay. Thus, we rule out cases such as X t → Y t+1 → X t+2 .
The system can be partitioned as where the submatrix of parameters Ψ XX is full rank Because X is the vector of ordinary variables, Ψ XX is full rank and the eigenvalues of I p + Ψ XX must be less one in absolute value. 13 If the variables in T are the actual I(1) fundamental trends, as opposed to ordinary variables that serve as the conduits of the fundamental trends into the observable system, they must be mutually causally independent, requiring Ψ TT = 0 qq , and strongly exogenous, requiring Ψ TX = 0 qp (Johansen 1995, p. 77;Juselius 2006, p. 263).
The Ψ matrix in (5) can be decomposed analogously to the Π matrix in (1) such that Ψ = αβ , where α is (p + q) × r and β is r × (p + q). The transitional causal structure embedded in Ψ that governs the transmission of shocks and ultimately determines the long-run causal structure reflected in (25) can be represented in this αβ -decomposition in the following canonical way-variables that are both cointegrated and directly causally connected are represented by the individual cointegrating relations expressed in β and the effects of causes are indicated by non-zero coefficients in α. To take a concrete example, consider a specific causal structure embedded in a DGP like (5) and represented graphically in Figure 2. (With causal time-series graphs, we suppose henceforth that the arrows correspond to a one-period lag between a direct cause and its effect.) 10 of 25 are both cointegrated and directly causally connected are represented by the individual cointegrating relations expressed in β and the effects of causes are indicated by non-zero coefficients in α. To take a concrete example, consider a specific causal structure embedded in a DGP like (5) and represented graphically in Figure 2. (With causal time-series graphs, we suppose henceforth that the arrows correspond to a one-period lag between a direct cause and its effect.) Thus, the causally canonical representation of Figure 2 would be given as The rules governing the translation of the Figure 2 or any graph into the a DGP analogous to (7) are straightforward: i. Each single-variable direct causal pair or each collider is represented by a cointegrating relationship corresponding to a unique row of the β′ matrix where the value of the parameter for the effect is normalized to unity; ii. There are as many adjustment parameters in α as there are rows in β′ (at most one per row) with the column of each non-zero parameter in α corresponding to the row of one of the effects (i.e., corresponding to the row in which that variable is normalized to unity) in β′; iii. If any variable is a cause, but not an effect with respect to all the other variables, it corresponds to a zero row in α (and, thus, is weakly exogenous).
The β matrix thus tells us which variables are related causally and, therefore, connected by edges, and the α matrix (equivalently the normalization of β′) tells us which way the arrows point for those edges. Thus, the causally canonical representation of Figure 2 would be given as The rules governing the translation of the Figure 2 or any graph into the a DGP analogous to (7) are straightforward: i. Each single-variable direct causal pair or each collider is represented by a cointegrating relationship corresponding to a unique row of the β matrix where the value of the parameter for the effect is normalized to unity; ii.
There are as many adjustment parameters in α as there are rows in β (at most one per row) with the column of each non-zero parameter in α corresponding to the row of one of the effects (i.e., corresponding to the row in which that variable is normalized to unity) in β ; iii.
If any variable is a cause, but not an effect with respect to all the other variables, it corresponds to a zero row in α (and, thus, is weakly exogenous).
The β matrix thus tells us which variables are related causally and, therefore, connected by edges, and the α matrix (equivalently the normalization of β ) tells us which way the arrows point for those edges.
Except for trivial reorderings of the variables and rescalings, the DGP (7) uniquely represents the causal graph in Figure 2. Algebraically, however, the matrices α and β are not unique. They can be rotated to form other pairs (α* and β*) such that Ψ = α*β* . The αβ -representation and the α*β* -representation yield the same value of the likelihood function. The problem of causal search is to find empirical information, other than the value of the likelihood function, that would allow us to select the canonical representation as in DGP (7) that corresponds to the graph of the data-generating process.

Formation and Sharing of Local Trends
We can think of the causal graph of a system of I(1) variables as representing the channels of transmission of these trends. Each collider corresponds to the creation of a local trend, and the causal variables involved in the collider are cointegrated with the effect variable. The transmission of a local trend from one variable to a single other variable also implies the cointegration of the cause and the effect.
Although causal connections produce cointegration, cointegration itself is not essentially a causal notion. Instead, cointegration results either (a) when a local trend is shared by two variables or (b) whenever the number of variables sharing the same fundamental trends, whether or not they share the same local trends (i.e., whether or not they share the fundamental trends in the same proportions), exceeds the number of fundamental trends. Thus, in case (b), if there is a set of variables each of which is driven by the same q fundamental trends, then any q + 1 of them will be cointegrated. A causal connection is, thus, sufficient for the cointegration of the complete set of causes with their effect, but it is not necessary. Proposition 1. Causal Cointegration: If each member of the set of parents of a variable C in a causal graph is I(1), then the set of variables consisting of C and its parents, is cointegrated.
It is convenient to write the fact that a set of variables is cointegrated as CI(Z), where Z is a set of variables with two or more members. Thus, if the variables A and B are cointegrated, we can write this as CI({A, B}). Two terms will prove useful: Definition 2. A cointegrating group is a set of variables in which every pair of variables shares the same common local trend-i.e., every pair is cointegrated.

Definition 3.
A collider group is a set of variables consisting of a variable C and the complete set of its parents.
The variables in a cointegration group share a single common local trend; while the variables in a collider group generate a new local trend at C. The same variable may be part of both a cointegration group and a collider group. Other sets of cointegrating variables may be in neither type of group. Davidson (1998, p. 91) introduces a useful concept, which we define here slightly differently that he does.

Definition 4.
A set of variables is irreducibly cointegrating (notated IC(.)) if, and only if, it does not contain a subset that is itself cointegrated.

A State-Space Analysis of the CVAR
It will prove useful to examine the relationship between weak exogeneity and the causal graph. Weak exogeneity is not in itself a causal property; rather, it is a property related to the manner in which a likelihood function can be decomposed into a conditional and marginal probability distribution under a given parameterization (Engle et al. 1983). Although weak exogeneity is important because it is turns out to be the condition that guarantees that the parameters of interest can be efficiently estimated, we are not interested in the current paper in efficient estimation. Rather we want to show how zero rows in α in the CVAR for subsets of variables, known as "weak exogeneity" conditions, can reveal information about the causal structure of the DGP.
Given a DGP, the weak exogeneity status of its variables will depend on the model we estimate. So, for example, if (7) were the DGP with ψ ij 0 and we estimated a CVAR with precisely the form of the DGP with ψ ij unrestricted, then the variables T 1 and T 2 would be weakly exogenous in the model for {A, B, C, D, E, T 1 , T 2 } t+1 given {A, B, C, D, E, T 1 , T 2 } t for the coefficients ψ ij or (α ji , β ij ), i = 1, 2, . . . , 5, j = 1, 2, . . . , 7. Our main interest, however, will be in the case in which only a subset of the variables is observed-leaving other variables in the DGP latent. So, for example, we might consider data generated by (7) but observe only B, C, and E. These variables can be modeled in a CVAR form, but the coefficients of the model will not in general be the same as those of (7), though we could compute them if we knew the DGP. Still, we can ask the question whether we can decompose the likelihood function of this model, with some unobserved variables, in a manner that renders some of the observed variables weakly exogenous with respect to the coefficients of a conditional model for the remaining observable variables.
We can notate this weak exogeneity using a new symbol "α", which means "is weakly exogenous for" and is to be distinguished from "→," which means "directly causes." Thus, X α Y can be read as "the variables in the set X are weakly exogenous for the coefficients of a CVAR model of Y conditional on X" or, leaving the relativity to a particular set of parameters implicit, "X is weakly exogenous for Y." If we know the causal graph of the DGP, then we can read the various weak exogeneity relationships for models of different subsets of variables from information in the causal graph. As a result, if we can identify weak exogeneity relationships for different subsets, we may be able to work backwards to determine which causal graphs could have generated them. 14 The object of the analysis is to use tests of long-run weak exogeneity in CVARs of the form of Equation (1) applied to only the observable variables to discover restrictions on allowable causal ordering of the underlying DGP (6). Long-run weak exogeneity corresponds to a zero row in the α matrix of the CVAR, so a critical goal is, given a particular DGP, to determine what it implies for the α matrix of a CVAR of the subset of observable variables (Johansen 1995, Section 8.2.1; and Juselius 2006, Section 11.1).
Johansen (2019) provides a state-space analysis of the DGP of a CVAR that allows us to determine analytically what statistical tests of weak exogeneity should find (given sufficient data and so forth) for different subsets of observable variables. Fundamental trends are assumed to be latent. In order to analyze weak exogeneity among subsets of variables, Johansen partitions the ordinary variables X t = [X 1t , X 2t ] into those that are in the subset of interest X 1t (referred to as observed) and those outside the subset X 2t (referred to as the unobserved). Then, rather than partitioning Ψ as in (6), partition , where the submatrices of parameters may or may not coincide with the Ψ ij , depending on whether any ordinary variables are unobserved. The m × p null element in the lower left-hand corner of the Ψ matrix corresponds to the assumption that the fundamental trends are strongly exogenous, and the m × m null element in the lower right-hand corner indicate that fundamental trends do not cause one another.
contains the parameters of the ordinary variables. Only the parameters in M 11 relate exclusively to the p 1 observed ordinary variables, while the other M ij contain parameters that relate partly or exclusively to the p 2 latent ordinary variables. The submatrix contains the coefficients in C 1 that relate to the effects of the latent fundamental trends on the observed ordinary variables and those in C 2 that relate to the their effects on the unobserved ordinary variables. A state-space representation of DGP (6) can then be given.
where t = 0, 1, . . . . , n − 1, and T 0 = 0 and X 0 = 0. The shocks are partitioned into those affecting ordinary variables (ε) and those affecting the latent variables (η), with (ε t , η t )~i.i.d. N p+m (0, Ω), where Ω is diagonal. In keeping with the distinction between ordinary variables and fundamental trends, we assume that the eigenvalues of I p + M, I p1 + M 11 , and I p1 + M 22 are less than one in absolute value, so that the source of the nonstationarity of X t is the fundamental trends rather than its own dynamics. The matrix C represents the proportions of fundamental trends present in observable variables but transmitted to them through latent causal connections and not via causal relationships among the observable variables. Thus, while the non-zero entries of M correspond to the edges in a causal graph, C is not given a direct graphical interpretation. The fundamental trends are embedded in T, but the variables included in T should be regarded as local trends, which may either be latent fundamental trends directly causing the observed variables or latent ordinary variables that carry some linear combination of fundamental trends and cause the observable variables. Therefore, while we have assumed that Ω η is diagonal, it need not be (and the conclusions about weak exogeneity in the next subsection would be unaffected).
Suppose that the DGP is described as in systems (8)-(10), and we wish to know whether any of the observed variables (X 1t ) are weakly exogenous in a CVAR of the observed variables only. This comes down to the question of whether α in that CVAR has any zero rows. Johansen proves that the α of such a CVAR can be written as where the conditional variances are and the long-run variances are see (Johansen 2019, Sections 2 and 3, especially Equations (12) and (13), Theorem 3, and Equation (18)).
In the simpler case, in which all variables are observed (i.e., there are no X 2 's), Johansen (2019, Section 3, Case 1) shows the formula in Equation (11) can be made even simpler:

Weak Exogeneity and Causal Order
Johansen (2019) state-space representation and his Theorem 2 offer a tool for analyzing weak exogeneity for subsets of variables in the DGP. These, in turn, correspond in systematic ways to facts about the causal structure of the DGP itself. Consider some illustrative cases: Case 1. Consider the causal graph in Figure 3, in which all ordinary variables are observed and only the fundamental trends are unobserved, so that (12), the simpler formula for α, applies. The DGP in Equations (8)-(10) specializes to where where ω ii = var(ε it ), i = A, B, C; Thus, where the asterisk (*) indicates a non-zero value. 15 The first two rows of α are zero and, therefore, A and B are weakly exogenous for C (i.e., {A, B} α C). Notice that it does not matter, what the causal relations are among the observables, since they are encoded in the M 11 matrix, which plays no part in the determination of α in Equation (11). What matters is which variables convey the fundamental trends to the observables. Case 2. Unfortunately, the simple mapping between weak exogeneity and causal connection suggested by Case 1 does not hold up. Consider Figure 4, which adds the variable D and edges connecting it to other variables in Figure 3. The analysis proceeds just as in Case 1. Again, since all variables are observable, the simpler formula (12) applies. The other relevant matrices of the state-space formulation are given by These imply that which has no zero rows; which, in turn, implies that none of the variables is weakly exogenous. 16 The variables A, B, C, D are cointegrated (CI({A, B, C, D})); but with two fundamental trends and four variables, every three-member subset of the ordinary variables is also cointegrated, implying not IC({A, B, C, D}). This appears to be a robust finding-the parents in a collider are weakly exogenous only when the colliding set is irreducibly cointegrated. Case 3. It is tempting to think that we might consider an irreducible subset of the variables in Figure 4, such as {A, B, C} and find the same weak exogeneity relations as we did in Figure 3. That, however, does not work. In analyzing the subset, we are effectively treating D as an unobserved variable; and we must, therefore, apply the more general formula (11), which requires additional information. The critical elements of the state-space representation of this reduced system are (Note that, although Ω η is diagonal by assumption, the off-diagonal elements of V TT here are nonzero. This is the result of D, transmitting T 2 to the collider at C. The calculation of V (see Equation (11)) conditions {D, T 1 , T 2 } on {A, B, C} and, in effect, conditions the independent (distal) causes T 1 and T 2 on their common (indirect) effect, which induces probabilistic dependence between them.) The variance of the X 1t is With no zero rows in α, none of the variables is weakly exogenous. Although D is unobservable in the DGP that actually determines the value of the observable variables, it provides a conduit from the fundamental trends to C that is distinct from the observable conduits, A and B. It is as if the graph of Figure 4 has been transformed into Figure 6, where the dashed arrow indicates a causal connection between T 2 and C, mediated by D in the DGP but not observable in the CVAR of the subset {A, B, C}. Unobserved mediating causes, like D, can make an indirect causal connection appear to be direct. Case 4. In Case 3, weak exogeneity failed to obtain, even though the causal connections were genuine.
It can also happen that weak exogeneity does obtain, even when causal connections are missing. Consider The set {A, B, D}, therefore, is not irreducibly cointegrated. It appears that a mapping between weak exogeneity and causal connections can be established only in irreducibly cointegrated sets. Case 5. Weak exogeneity may fail to track direct cause. Consider a causal chain: All four observable variables form a single cointegration group, sharing the single fundamental trend. Note that B α C and that {B, C} form a cointegration group. We might be tempted to conclude that these facts would warrant inferring what is, in fact, true that B → C. A similar case shows the problem: A α C and CI({A, C}); but, in fact, it is not true that A → C (A is an indirect, but not a direct, cause of C). It is worth showing why it is the case that A α C, as it highlights a subtle issue. We take {A, C} to be observed and {B, D} to be unobserved. Then the relevant matrices are Case 2. Unfortunately, the simple mapping between weak exogeneity and causal connection suggested by Case 1 does not hold up. Consider Figure 4, which adds the variable D and edges connecting it to other variables in Figure 3. The analysis proceeds just as in Case 1. Again, since all variables are observable, the simpler formula (12) applies. The other relevant matrices of the state-space formulation are given by which has no zero rows; which, in turn, implies that none of the variables is weakly exogenous. 16 The variables A, B, C, D are cointegrated (CI({A, B, C, D})); but with two fundamental trends and four variables, every three-member subset of the ordinary variables is also cointegrated, implying not IC ({A, B, C, D}). This appears to be a robust finding-the parents in a collider are weakly exogenous only when the colliding set is irreducibly cointegrated.

of 25
Case 3. It is tempting to think that we might consider an irreducible subset of the variables in Figure 4, such as {A, B, C} and find the same weak exogeneity relations as we did in Figure 3. That, however, does not work. In analyzing the subset, we are effectively treating D as an unobserved variable; and we must, therefore, apply the more general formula (11), which requires additional information. The critical elements of the state-space representation of this reduced system are         Case 5. Weak exogeneity may fail to track direct cause. Consider a causal chain:     Case 5. Weak exogeneity may fail to track direct cause. Consider a causal chain: The variance of the X 1t is The zero row in α implies that A α C. The result hinges crucially on V 2T being a zero matrix. This is, in turn, implied by the fact that A screens off B and D from T in the graph. Conditioning on the screening variable A as is done in the calculation of V 2T renders both B and D probabilistically independent of T.
Using a similar analysis, it is also easy to show that the subset {B, D} displays the same pattern as {A, C}: B α D and CI({B, D}), yet it is not true that B → D. The example shows that we have to be careful in making such inferences, but not that they are hopeless. Note that we can show that A α {B, C, D}; B α {C, D}; and C α D; so that the variables form a nested hierarchy with A at the top. This hierarchy can be reinterpreted as a chain: A α B and all variables lower in the hierarchy; B α C and all variables lower in hierarchy; C α D; and D is not weakly exogenous for any variable. Such as chain recapitulates the causal graph. The lesson is that a when a variable is weakly exogenous for another variable in a cointegration group, it is a direct cause only if it is adjacent in the sense of sitting at the immediately higher step of the hierarchy.
Although we have not provided a proof, these cases suggest how to read weak exogeneity off a causal graph. There are four conjectured criteria: A.
Within a set of variables that form a cointegration group, a particular variable is weakly exogenous for the group if, and only if, it is the sole source of the local trend that cointegrates the group; B.
The parents in any set of variables that form a collider group in which two or more local trends are combined are weakly exogenous for the child in the collider group, provided that the number of variables in the group is fewer than one plus the number of fundamental trends carried by those variables; C.
If a collider fulfills criterion B, then in any set that replaces one or more weakly exogenous parents with a variable in the same cointegration group as that parent, provided the variable is itself weakly exogenous for the parent, will also be weakly exogenous for the child. (Thus, in Figure 5, in the collider {A, C, E}, {A, C} α E; but in the set in which B replaces C (both in the same collider group), {A, B} α E));

D.
If a collider fulfills criterion B, then any variable that is weakly exogenous for the child, either as a parent or as a member of the same cointegration group that replaces the parent, will be weakly exogenous for a variable that replaces the child from a cointegration group that includes the child and for which it is weakly exogenous. (Thus, in Figure 2, {T 1 , T 2 } α B, but in the set that replaces B with D, which are both in the same cointegration group, {T 1 , T 2 } α D.) The inferential lessons of Cases 1-5 can be summarized in three conjectured rules, consistent with visual reading of the graph: Rule 1 simply says that causation cannot run against the direction of weak exogeneity.

Rule 2.
In a cointegration group, if A α B and there is no C such that A α C and C α B, then A → B.
Rule 2 says that bivariate weak exogeneity coincides with direct causation, provided that the variables are adjacent. 17 Rule 3. A set of variables W with k ≤ q members forms a collider at one of its members (call it variable C), if (i) IC(W); (ii) W -C α C, where W -C is the set W omitting C; (iii) it is not the case that any member B ∈ W -C is a member of a cointegration group Z such that, for any member D ∈ Z (excluding B), B α D and W -B+D α C, where W -B+D is W with D taking the place of B; and iv) it is not the case that C is a member of a cointegration group Z such that for any member E ∈ Z (excluding C) that E α C.
Rules 3 says that if a set of k + 1 variables is irreducibly cointegrated and k variables are jointly weakly exogenous for the k + 1th variable, then they form a collider, provided that each of the weakly exogenous variables is adjacent to the third variable (established by conditions (iii) and (iv)).

The Basis for a Long-Run Causal Search Algorithm?
The DGP that adequately represents the causal structure in the economy is not directly observable. Might it be inferred on the basis of data and not simply imposed as a priori restrictions on the CVAR? Based on our analysis of long-run causal structure, can we recover reliable information about the underlying DGP from the facts of cointegration and weak exogeneity analogously to the way in which graphical causal search algorithms infer causal structure for stationary data from empirical evidence about probabilistic dependence and independence among subsets of variables? Davidson (1998, Section 3) proposes a search algorithm that identifies every irreducible cointegrating set of variables within a CVAR. He then uses that information where possible to identify the cointegrating relations in the β matrix. This strategy is successful in some cases and not others. There is an analogy with causal search for stationary variables. Despite the slogan, "correlation is not causation," it is sometimes possible to infer causal direction from tests of unconditional dependence. For example, for a causally sufficient set of three stationary variables with an acyclical data-generating process, if A and C are not correlated, but A and B and B and C are correlated, then A → B ← C is the only consistent causal graph. In most cases, however, unconditional independence is not enough. Relations of conditional dependence and independence provides a richer source of information for inferring the direction, as well as the existence of causal edges (see Section 2.2 above).
Davidson's schema places cointegration in something like the logical role of unconditional independence (or correlation) in the stationary case. The analysis of Section 4 suggests that Davidson's inferential scheme can be further developed by explicitly recognizing, first, that the ultimate source of nonstationarity in any set of variables is often found in latent trends and, second, that assessment of weak exogeneity may provide evidence of causal asymmetry. Within irreducibly cointegrated subsets of the variables, weak exogeneity can function in something like the logical role of conditional independence, when processed according to the three rules of Section 4.4, and may provide richer, empirically grounded information about the identification of the CVAR. As with causal search in the stationary case, the application of these rules is unlikely to identify every possible causal graph but may sometimes be able to partially or completely uncover the underlying causal structure.
To illustrate, we analyze two cases-one with and one without causal sufficiency.

Long-Run Causal Search in a Causally Sufficient Graph
Consider the DGP in Figure 2 and assume that its variables are causally sufficient and all (including the fundamental trends) are observed. We are interested in the logic of causal inference rather than the statistical problem of inference, so we also assume that prior statistical testing has successfully identified the facts with respect to the cointegration rank of the system and cointegration and weak exogeneity among any subset of variables. (In the language of the causal search literature, we assume that we have an oracle.) Naturally, in practice our inference cannot be more certain than the statistical inferences that provide our assumed facts. Can we use this information to recover the graph of the DGP?
The inference problem can be viewed as how to place the zero and non-zero coefficients in the α and β matrices in Equation (7).
Given that we know that the cointegration rank is 5, we know that there are two fundamental trends. This implies that α is 7 × 5 and β 5 × 7. Since T 1 and T 2 are weakly exogenous with respect to all other variables in the system, we may conclude that, even if they are not identical with the fundamental trends (which in this case, of course, they are), they are at least the unique sources introducing those trends into the system. And we are entitled to enter zeroes in the entire rows of α corresponding to T 1 and T 2 . Without loss of generality, we may enter non-zero α ij s along the main diagonal of the submatrix of α, excluding the T 1 -and T 2 -rows, and zeroes everywhere else. Similarly, we may enter ones on the main diagonal of the submatrix of β that excludes the last two columns.
With two fundamental trends, no irreducible cointegrating relation can involve more than three variables. Exhaustive consideration along Davidson's lines would produce 21 possible cointegrating pairs and 35 possible cointegrating triples. Similarly, we need to consider possible weak exogeneity of variables within each irreducibly cointegrating subset. Most of subsets are not irreducibly cointegrating or do not contain weakly exogenous variables, so rather than tediously listing the weak-exogeneity status of all 56 subsets systematically, we just note the salient ones.
From the facts that CI({A, T 1 }) and that there are no other variables in this cointegration group and that T 1 α A, Rule 2 implies T 1 → A, which justifies the placement of β AT 1 in row 1 of β and zeroes in the remaining unassigned places in that row. Analogous reasoning with respect to {C, T 2 } implies T 2 → C and justifies the placement of β CT 2 and the zeroes in row 3. Again, with respect to {B, D}, analogous reasoning justifies the placement of β DB and the zeroes in row 4. In addition, in this case, Rule 1 and the fact that B α D imply that not (D → B) and justify the zero in row 2, column 4.
Rule 3 and the facts that IC({T 1 , T 2 , B}), that B is not part of a cointegration group with either T 1 or T 2 , and that {T 1, T 2 } α B allows us to identify the collider T 1 → B ← T 2 and justifies the placement of β BT 1 and β BT 2 and the remaining zeroes in row 2 of β .
Rules 3 and the facts that IC({B, C, E}), ({B, C} α E, and not (C α T 2 ), with which it forms a cointegration group, allows us to identify the collider B → E ← C and justifies the placement of β EB and β EC and the zeroes in row 5 of β . With that, we were able to recover the entire DGP graph using only the facts of cointegration and weak exogeneity.

Long-Run Causal Search in the Presence of Latent Trends
The CVARs typically estimated in practice most often do not contain variables that are weakly exogenous for the whole system, which could, therefore, be identified as the conduit of the fundamental trends to the other variables in the system. It is, therefore, worth considering how the principles of search might operate when fundamental trends are latent variables. It is possible to apply the rules of Section 4.2 to the variables generated according to Equation (7) when only the ordinary variables (A, B, C, D, E), but not the fundamental trends (T 1 and T 2 ), are observed.
For some of the causal edges, the reasoning of Section 4.3 is still applicable, and we would be able to infer the edges shown in Figure 7: B → D and B → E ←C. The remainder of Figure 7 requires further comment. EB  and EC  and the zeroes in row 5 of β′. With that, we were able to recover the entire DGP graph using only the facts of cointegration and weak exogeneity.

Long-Run Causal Search in the Presence of Latent Trends
The CVARs typically estimated in practice most often do not contain variables that are weakly exogenous for the whole system, which could, therefore, be identified as the conduit of the fundamental trends to the other variables in the system. It is, therefore, worth considering how the principles of search might operate when fundamental trends are latent variables. It is possible to apply the rules of Section 4.2 to the variables generated according to Equation (7) when only the ordinary variables (A, B, C, D, E), but not the fundamental trends (T1 and T2), are observed.
For some of the causal edges, the reasoning of Section 4.3 is still applicable, and we would be able to infer the edges shown in Figure 7: B → D and B → E ←C. The remainder of Figure 7 requires further comment. We are unable to infer the edges between T1, T2 and A, B, and C for the simple reason that the two fundamental trends are not observed and the inference of the edges in which they are involved requires their observability. However, we do know from the fact that the cointegration rank is 3 that there are two fundamental trends. What we cannot say, however, is precisely how those two trends enter directly into the observable system. They may, in fact, be transmitted through ordinary variables that are also latent. We do know that they must enter through A, B, or C. If that were not the case and a fundamental trend entered through D or E, we would not have found that CI({B, D}) or {B, C}  E. This is indicated in Figure 7 by the oval enclosing the ordinary variables and the circles We are unable to infer the edges between T 1 , T 2 and A, B, and C for the simple reason that the two fundamental trends are not observed and the inference of the edges in which they are involved requires their observability. However, we do know from the fact that the cointegration rank is 3 that there are two fundamental trends. What we cannot say, however, is precisely how those two trends enter directly into the observable system. They may, in fact, be transmitted through ordinary variables that are also latent. We do know that they must enter through A, B, or C. If that were not the case and a fundamental trend entered through D or E, we would not have found that CI({B, D}) or {B, C} α E. This is indicated in Figure 7 by the oval enclosing the ordinary variables and the circles (indicating their latency) around the fundamental trends. The arrows running from the latent fundamental trends to the oval, stopping short of the particular variables indicates that we know that these variables are caused by these trends, albeit we do not know exactly what the connections are. Thus, instead of (7), we can fill in the causally ordered CVAR Equation (15) with the ambiguous information depicted in Figure 7, where the question marks indicate parameters that correspond to possible, but yet-to-be-determined causal edges. ? 1 0 0 ? ? 0 β DB 0 1 0 0 0 0 β EB β EC 0 1 0 0 Equation (15) depicts what observables imply about the DGP and not just facts about the observables themselves. Here the two trends are not observable, but we know that there are two latent trends because none of the observable variables is weakly exogenous when one considers the whole set of observable variables, which again justifies the placement of the two zero rows in α.
Neither the graph nor (15) conveys all the information that we have. We know, for instance, that there are two fundamental trends and that at least one of the fundamental trends must causally influence each of A, B, and C. If that were not so, then the only way that all three variables could carry the trends and be irreducibly cointegrated would be for them to form a collider group in which one pair is weakly exogenous for the remaining variable. Given the DGP, we know that the weak exogeneity search should not find such a pattern. Furthermore, we know that no two of A, B, and C could have a common latent cause. If that were not true, that pair would form a cointegration group, which, given the DGP, the search for cointegrating pairs should not find such a cointegration group. These two conclusions imply that each of the three observed variables carries the fundamental trends in distinct proportions. These facts place restrictions on how the last two columns of the β in (15) can be filled in to be consistent with the DGP. In particular, in 3 × 2 submatrix in the upper right-hand corner of β , at least one row must contain two nonzero entries and the remaining two rows cannot have zeroes in the same column. This guarantees that the variables A, B, C form a cointegration group without also forming a collider group with weakly exogenous parents.

Conclusions
In the history of econometrics, the problem of identification and the notion of causal order have long been connected-both in the work of Simon and the early Cowles Commission program and in the literature on SVARs. Typically, economists have relied heavily on the idea that a priori restrictions derived somehow from economic theory would provide the needed identification. Recent work on graphical causal modeling, however, has shown that there is often unexploited information that could provide a firmer, empirical basis for identification. In the case of cross-sectional data or the contemporaneous causal orderings of SVARs, the graphical causal modelers have stressed the information contained in conditional independence relationship encoded in the probability distribution of the data. Conditional independence may also be a resource in the case of the long-run dynamics of the CVAR, although the fact that nonstationary data involves non-standard distributions poses some challenges. We have suggested here that nonstationary data also present the opportunity to take a different approach.
Where do the trends we observe among macroeconomic variables come from? We showed that it is possible for the structure of the DGP to be such that a set of observable variables trends without any fundamental trends acting as drivers. Yet, we have argued that these cases rely on particular configurations of coefficients that are likely not to be robust to small changes in coefficients and that call out for an economic explanation of why they arise at all. Once a distinction is drawn between fundamental trends and ordinary variables, it is clear that a more robust account for nonstationary behavior is that it is transmitted from its fundamental sources to variables that without these fundamental trends as direct or indirect causes would not naturally be nonstationary. In typical CVAR analysis, econometricians mostly do not find variables that themselves can be identified as the source of fundamental trends. This suggests that, in most cases, fundamental trends are latent variables, and any sort of structural or causal analysis of CVARs must account for their latency.
We suggested-somewhat informally-that combining Davidson's suggestion of a comprehensive search for sets of irreducible cointegrating relations with a similar comprehensive search of weak exogeneity among those sets could provide a non-a priori empirical basis for discovering identifying restrictions on cointegrating relations, as well as information on causal direction. We showed that in a simple example, the complete causal graph of the CVAR could be recovered. But, in most cases in the face of latent variables, these restrictions are unlikely to provide complete identification. Nevertheless, as in our illustration, some of the cointegrating relations may be identified, even when there are latent trends. It is also possible that, in some cases, it would be possible to recover estimates of the trends using state-space methods (see, e.g., Johansen and Tabor 2017). Finally, viewing the CVAR through the lens of latent fundamental trends reinforces Juselius's advocacy of simple-to-general modeling in the CVAR context (Juselius 2006, Chapter 22, especially. Sections 22.2.3 and 22.3). Cointegrating relations are robust to widening the data set to include more variables. The aim of such widening can be seen as an effort to discover the observable variables that are the counterpart of the latent trends in narrower data sets.
Funding: This research received no external funding.