Causal Transmission in Reduced-Form Models

We propose a method to explore the causal transmission of an intervention through two endogenous variables of interest. We refer to the intervention as a catalyst variable. The method is based on the reduced-form system formed from the conditional distribution of the two endogenous variables given the catalyst. The method combines elements from instrumental variable analysis and Cholesky decomposition of structural vector autoregressions. We give conditions for uniqueness of the causal transmission.


Introduction
In general, it is difficult to deduce the causal ordering of two observed variables from their joint distribution. However, if we can assume that a third variable is causal, it may be possible to deduce how the effect of this third variable will transmit between the two variables of interest. By conditioning on a catalyst, the joint distribution of a bivariate system can be used to infer a causal transmission. Our approach allows for different catalysts transmitting through the same two variables in different ways. We formulate this for a general distributional setup.
Philosophers and scientists argue that some background of causal knowledge is required in order to construct new causal facts. The view "no causes in, no causes out" (Cartwright 1989) expresses the concern that we cannot jump from theory to cause without some causal facts in hand. Pearl (2000) similarly underlines the importance of distinguishing between causal and associational concepts, as every causal conclusion relies on a causal assumption that is untested in observational studies. In contrast, Granger (1969) causality is an example of an associational concept seeking to infer correlations from data without a causal assumption. Moreover, Granger causality is concerned with temporal correlations as opposed to the ordering of contemporaneous variables. Causal analysis goes one step further by inferring correlations under changing conditions. We combine elements from instrumental variable analysis and recursive ordering of structural vector autoregressions. Instrumental variable analysis will in general not order the endogenous variables but can be used to identify a structural relation uniquely. Cholesky decomposition orders endogenous variables, but the ordering is not unique. By carrying out a Cholesky decomposition in the presence of an instrument, there is scope for a unique ordering which is interpretable as a causal transmission. In this situation we will refer to the instrument as a catalyst.
The catalyst w may transmit causally through the variables y, z. It is possible that w transmits through z to y or through y to z or, of course, that there is no ordering of the variables. We present two sets of testable conditions. A first set of conditions is needed for establishing that the catalyst w transmits through z to y, say, in a unique fashion. A Suppose we are interested in an economic relationship between two endogenous, or modeled, variables (y, z) given a third variable w. Thus, we are interested in the conditional distribution f(y, z|w). Under normality, the distribution of y, z given w is given by where the innovations are normally distributed with positive definite variance y z D = N 0 0 , σ yy σ yz σ zy σ zz . ( There are two different ways of ordering y and z, corresponding to two Cholesky decompositions. First, we can condition y on z to obtain the equations with derived coefficients γ yz = σ yz /σ zz and γ yw·z = γ yw − γ yz γ zw , and independent, normal innovations y·z , z with variances σ yy·z = σ yy − σ 2 zy /σ zz , σ zz . Second, we can condition z on y to obtain the equations z = γ zy y + γ zw·y w + z·y , where γ zy = σ zy /σ yy and γ zw·y = γ zw − γ zy γ yw , as well as the independent, normal innovations z·y , y with variances σ zz·y = σ zz − σ 2 yz /σ yy , σ yy . Without further information, the two orderings are equivalent in the sense of giving the same joint distribution. A unique ordering arises from the Equations (3) and (4) under the restrictions γ yw·z = 0 and γ zw = 0. The Equations (3) and (4) then reduce to y = γ yz z + y·z , This ordering of (y, z) is unique in the sense that it is not possible to have γ yw·z = 0 and γ zw = 0 so that (3) and (4) reduce to (7) and (8) and at the same time have γ zw·y = 0 in (5) and (6). We prove this result for general distributions in Section 3.1.1.
We also want to ensure that a shock represented by w feeds through to y in the system (7) and (8). We analyze this in two steps. In Section 3.1.2, we will say that (7) and (8) has a non-trivial Markov structure if changes in w impact the distribution of z and changes in z impact the distribution of y. In (7) and (8), this requires that γ zw = 0 and γ yz = 0. In general, however, this is not sufficient to ensure that changes in z impact the distribution of y. For this to happen in the system (7) and (8), it is required that γ yw = 0 in the marginal Equation (1). While this condition follows from previous conditions in the case with normal errors, this is not true for general distributions, so we provide a detailed analysis of this condition.
The conditions mentioned above would all be testable when implemented in a statistical model. A framework for causal interpretation of the above structure is discussed in Section 3, which is distinct from the structural interpretations in the usual instrumental variable problem, see Section 4.2, and from super exogeneity, see Section 4.5.
We note, in passing, that the situation described in Equations (7) and (8) is different from the common features concept (Centoni and Cubadda 2015;Engle and Kozicki 1993;Vahid and Engle 1993). There, the objective is, starting from Equation (1), to find a linear combination of y and z that does not depend on w. Under the relevance condition γ zw = 0, we can define δ yz = γ yw /γ zw and find that y − δ yz z does not depend on w. Thus, we obtain where δ = y − δ yz z . The covariance of δ and z is σ yz − δ yz σ zz . This covariance reduces to zero under the additional restriction that δ yz = γ yw /γ zw equals γ yz = σ yz /σ zz , in which case δ = y·z , and the system (9) and (10) reduces to (7) and (8). We will see that the additional independence assumption for ε y·z and ε z is what gives the ordering of the variables.

Causal Transmission
We analyze a joint conditional probability model for two endogenous variables given a third variable, with a view to establishing conditions for unique asymmetric flow of influence from the conditioning variable. We give results for unique ordering and non-trivial transmission in a general bivariate distribution setup. From this, we define causal transmission.

Result for General Distributions
For a general joint density of y, z conditional on w, f(y, z|w), we explore testable restrictions that ensure a unique and non-trivial chain from w through z to y. In Section 3.2, we interpret w as a catalyst that initiates a unique causal transmission through z to y.

Unique Markov Structure
The natural generalization of the result for normal distributions is a Markov property. Generally, the joint density of (y, z|w) can be decomposed as At this point, there is no natural ordering of the bivariate system. The uniqueness result is inspired by the normal example. It presents a condition under which we can rule out the possibility that both f(y|z, w)=f(y|z) and f(z|y, w)=f(z|y) hold. In other words, we give a condition that ensures a Markov chain from w to y through z while excluding a Markov chain from w to z through y. A key feature of the result is that it is concerned with properties of the conditional distribution of y, z given w. The proof builds on the ideas in the proof of the intersection property by Lauritzen (1996, Proposition 2.1), see also Dawid (1979, Lemma 4.3). That result is, however, aimed at exploring properties of the simultaneous distribution of three variables y, z, w.

Theorem 1.
Suppose the density f(y, z|w) has support on a product space, and it is positive on this support. Suppose that, for all y, z, Then f(z|w) = f(z) in a set of y, z with positive probability, ⇒ f(z|y, w) = f(z|y) in a set of y, z with positive probability.
The requirement in Theorem 1 that the support is a product space is satisfied in a range of common situations, for instance, in a normal setup. It allows for the less interesting case where y or z is atomic. If z is atomic, then condition (13) always fails. If y is atomic, then conclusion (14) reduces to (13).
Theorem 1 gives conditions for a unique Markov structure among the variables. Condition (12) implies Theorem 1 shows that conditions (12) and (13) imply (14), and therefore there is no Markov structure from w through y to z, that is In other words, the conditional model for y, z given w allows for two possible Markov structures, but we can distinguish these through testable assumptions.

Non-Trivial Markov Structure
The next step is a requirement that the Markov structure is non-trivial. Definition 1. Consider the conditional distribution of y, z given w with the Markov structure f(y, z|w) = f(y|z)f(z|w). If f(y|z) = f(y) and f(z|w) = f(z), on a set with positive probability, we have a non-trivial Markov structure. We represent this by the graph w -z -y, where the variable w is underlined to emphasize the conditioning on w.
The conditioning on w is emphasized by underlining the conditioning variable w in the notation wzy. This is to contrast with the notation wzy commonly used for undirected graphs in the graphical model literature. That notation is usually taken to imply that the unconditional distribution f(y, z, w) satisfies the Markov property see Lauritzen (1996, §2.4). The Markov property (17) in the unconditional distribution f(y, z, w) implies the Markov property (15) in the conditional distribution f(y, z|w) due to Bayes' Theorem, while the opposite implication requires the formulation of a distribution for w. In both cases, the dash notation is used as opposed to arrows to indicate that the Markov structures are undirected. We now combine Theorem 1 and Definition 1 to see that the two non-trivial Markov structures wzy and wyz cannot hold simultaneously.

Theorem 2.
Suppose the density f(y, z|w) has support on a product space, and it is positive on this support. Suppose that, for all y, z, and that, for all y, z in a set with positive probability, Then, we have a unique and non-trivial Markov structure w -z -y.

Non-Trivial Transmission
A non-trivial Markov structure does not, in general, imply that w and y are dependent, so w may affect z without affecting y. Indeed, the Markov structure wzy allows the possibility that y and w are independent. From a causal viewpoint, this is not so exciting, so we will seek to characterize when the effect is non-trivial.
For a non-trivial Markov structure wzy, the conditional distribution of y given w can be written as the compound distribution where f(y|z) = f(y) and f(z|w) = f(z). The integral can be interpreted as summation if the dominating measure dz is discrete. We would like to establish conditions ensuring f(y|w) = f(y).
Definition 2. Consider a non-trivial Markov structure w -z -y. There is a non-trivial transmission between w and y when f(y|w) = f(y) in a set with positive probability, represented as w -y.
We give a sufficient condition for a trivial transmission.
The condition in Lemma 1 for trivial transmission is the contradiction of condition (19) in Theorem 2 for a non-trivial Markov structure.
The trivial transmission property is also related to the singleton transitivity property, as expressed, for instance, in Wermuth (2012, §2.4);Fallat et al. (2017) also link singleton transitivity with a total positivity property. The difference between the concepts is subtle. Under singleton transitivity, the condition and the implication in Lemma 1 are swapped.
The condition in Lemma 1 is not necessary for a trivial transmission. Indeed, the following example for a conditional distribution f(y, z|w) is a case where the contrary condition (19) holds, yet the transmission is trivial. Similar examples for an unconditional distribution f(y, z, w) are given in Birch (1963, eq. 5.4) and Wermuth (2012, §4.1) to illustrate that distributions may not have the singleton transitivity property in general.
Example 1. Suppose w -z -y. We construct an example where it holds that f(y|z) = f(y) and f(z|w) = f(z), yet f(y|w) = f(y). Let w, y be binary, while z takes three values. Describe the conditional distributions f(z|w) and f(y|z, w) = f(y|z) by the transition matrices 0 1 2 z | w 4/8 3/8 1/8 0 4/8 2/8 2/8 1 0 1 y | z 1/4 3/4 0 2/4 2/4 1 2/4 2/4 2 The conditional distribution f(y|w), computed as the product of the transition matrices, satisfies f(y|w) = f(y), that is 0 1 y | w 3/8 5/8 0 3/8 5/8 1 From a causal transmission perspective, we are interested in exploring when the condition (19) in Theorem 2 for a non-trivial Markov structure is sufficient to give a non-trivial transmission. As remarked earlier, this holds for distributions satisfying the singleton transitivity property. This is satisfied if w, z, y satisfy a joint normal distribution (Wermuth 2012, §4.1) or are all binary (Birch 1963;Simpson 1951, §5). We give some further examples. In the first case, z is binary, but w and y need not be binary. Then, the question relates to collapsibility of contingency tables, see, for instance, Dawid (1980, Theorem 8.3), which is attributed to Yule.
Lemma 2. Suppose w -z -y with binary z. Then w -y.
Moving away from binary z, we find the same result for some common distributions.

Causal Interpretation
Theorem 2 gave testable conditions ensuring that the conditional distribution f(y, z|w) reduces to a non-trivial Markov structure wzy. This was followed in Section 3.1.3 by a variety of conditions ensuring a non-trivial transmission between w and y. In the following, we give this a causal interpretation. We will think of the variable w as taking a value that is determined outside the system (y, z). This value then transmits through the system as described by the conditional distribution f(y, z|w).
Definition 3. Consider variables w, z, y. Assume that for each realization of w, then f(y, z|w) describes the distribution of outcomes of y, z. Let w represent an intervention on the system. Then, we say that w is a catalyst.
By an intervention, we mean an external, autonomous change that affects only the specified subset of variables (Pearl 2000, p. 23). The objective is to separate actions, where variables are assigned values by intervention, and observations, where variables assume values according to a joint distribution. Pearl (2000) assumes, however, that the mechanism that is altered by an intervention is known, as is the nature of the alteration. Directionality within a system of variables is discovered or assumed prior to analysis of interventions by representing a joint distribution with a directed acyclical graph. Contrastingly, we only discover directionality conditional on the presence of an intervention; this corresponds to the notion of a transmission.
Definition 4. Consider a non-trivial Markov structure w -z -y with non-trivial transmission w -y and where w is a catalyst. Then, we have a causal transmission of the catalyst w to y through z. This is represented by the notation w → z → y.
Definitions 3 and 4 consider the testable and undirected Markov structure wzy and give it a causal interpretation. In Definition 3, the notation w → z → y is directional, so there is no longer a need for emphasizing the conditioning upon w as in wzy or wy. The important distinction between our exposition and the existing literature is the objective of characterizing potential unique transmission of catalysts using testable assumptions as far as possible. Definition 4 has the feature that we are agnostic about the causal relationship between the endogenous variables when a catalyst is not present.
Catalysts will not always be obvious but can potentially be discovered as natural experiments through examination of observational data on y t , z t , w t for t = 1, . . . , T. In the empirical illustration in Section 6, we have a vector autoregression for the velocity and cost of holding money augmented with dummy variables representing fiscal and oil shocks determined outside the system The density f(y, z|w) is then taken as the i.i.d. density for the innovations of the modeled variables given the dummy variables and past information. If one is interested in modeling monetary policy, one may observe market interest rates, inflation and a policy rate. The central banks observe the past market interest rate and the inflation and set the policy rate to influence the current and future market interest rate and policy rate. The policy rate could be modeled as one of the y, z variables, with w representing major external shocks such as the COVID-19 pandemic. Alternatively, if the policy rate is thought of as the w variable, there may be strong correlation with the lagged market rate and lagged inflation and one should consider the interpretation carefully. Causal transmission lends itself to statistical models with a repetitive structure which can be captured by the above ideas for f(y, z|w).
The assumptions required for a causal transmission exclude the situation of a collider. A variable z is a collider when f(y|z, w) = f(y|z) even though f(y|w) = f(y). In the graphical modeling literature, this is represented as w → z ← y. The first condition for a collider contradicts the conditions assumed in Definition 1, wzy, while the second condition contradicts the conditions assumed in Definition 2, wy.
We consider three special cases: a normal model and two types of logit/probitnormal mixtures.
Example 2. Suppose (y, z|w) has a bivariate normal distribution as in (3) and (4) or (5) and (6) with a positive definite covariance matrix. If γ yw·z = 0 while γ zw = 0, then Theorem 1 implies a unique Markov structure. If in addition γ yz = 0, then Theorem 2 and Lemma 3 imply a non-trivial Markov structure w -z -y and a non-trivial transmission w -y. When w is interpretable as a catalyst, then w → z → y.
Example 3. Suppose y is binary and (y, z|w) satisfies a logit-normal mixture model or a probitnormal mixture model. That is, the conditional distribution (y|z, w) satisfies then Theorem 1 implies a unique Markov structure. If in addition γ yz = 0, then Theorem 2 and Lemma 4 imply a non-trivial Markov structure w -z -y and a non-trivial transmission w -y. When w is interpretable as a catalyst, then w → z → y.
We note that in this situation, f(y|z, w) is much easier to work with than f(z|y, w). Due to Theorem 1, we only need to check the first instance to narrow the potential orderings of the system (y, z|w).

Example 4. Suppose z is binary and
, then Theorem 1 implies a unique Markov structure. If in addition γ yz = 0, then Theorem 2 and Lemma 2 imply a non-trivial Markov structure w -z -y and a non-trivial transmission w -y. When w is interpretable as a catalyst, then w → z → y.

Multiple Causal Transmissions
The concept of causal transmission generalizes to multiple catalysts that may flow through the system in different ways. For notational convenience, we present this by augmenting the linear, normal system (1) with two distinct catalysts w 1 , w 2 so that where f( y , z |w) = f( y , z ) is normal as in (2). The variables w 1 , w 2 are observable and may represent two types of shocks to the economy at different points in time.
We now set up the two possibilities for ordering y, z through conditioning. Conditioning y on z gives where y·z , z are independent and γ yz = σ yz /σ zz , while conditioning z on y gives z = γ zy y + γ z1·y w 1 + γ z2·y w 2 + z·y , where z·y , y are independent and γ zy = σ zy /σ yy . Assuming w 1 , w 2 are catalysts, we obtain two causal transmission hypotheses When the hypotheses H 1 and H 2 are both satisfied, we obtain causal transmissions in opposite directions, which we represent by superimposing two directed graphs The joint restrictions imposed by H 1 ∩ H 2 are possibly best expressed in terms of the original system (21) as: Written in a vector format, we have the reduced-form model where all coefficients in the conditional expectation are non-zero.

Detection of Outliers and Catalysts
In practice, catalysts may be discoverable from the empirical analysis of observational data. For this purpose, Hendry and Santos (2010) give an algorithm for discovering superexogeneity. This exploits the Autometrics algorithm in OxMetrics, see Doornik (2009) and Hendry and Doornik (2014).
This algorithm generalizes the robustified least squares approach used by Hendry and Mizon (1993) in their UK money analysis and by Hendry (1999) in his analysis of US food demand. A theory for analyzing such algorithms is gradually emerging. Indeed, a statistical theory for robustified least squares is presented in Hendry et al. (2008) and Johansen andNielsen (2009, 2016).

Structural Considerations
The causal transmission concept unites ideas from Cholesky decompositions within structural vector autoregressions with ideas from instrumental variable estimation. We explore how causal transmission arises as a special case in those two settings. The idea is that we define economic structure conditional on a variable w. This variable rather than the innovations will play the role of structural shocks and it will have features in common with instruments in traditional simultaneous equations models. The variable w could be an indicator variable for a particular event. It can arise from substantive considerations or it can potentially be found by outlier detection algorithms. We draw comparisons with the more restrictive concept of super exogeneity before delving into a structural interpretation when multiple catalysts are available. Sims (1980) used vector autoregressions to address the haphazard accumulation of restrictions to achieve identification in the large simultaneous equation models of the time. This approach has evolved into the frequently-used structural vector autoregressive (SVAR) approach, where a structural model is identified from the reduced form. In its basic form, this involves a recursive ordering of the variables. We will discuss how Cholesky decomposition relates to causal transmission.

Cholesky Decomposition
It is well known that, while useful, recursive orderings are not unique. Causal transmission takes its starting point in recursive orderings but uses a catalyst to establish a unique ordering. If we ignore dynamic features, we can explore this using the setup in Section 2. The reduced-form system for the variables y, z given w is then given by (1). Pre-multiplying that system by a square matrix A gives a structural model a 1y a 1z a 2y a 2z where e = A has covariance Ω e = AΣ A and where Σ is the covariance matrix in (2). A structural model of this general form is not identifiable from the reduced-form model. We therefore consider two Cholesky decompositions where A is triangular and Ω is diagonal. The first possibility is which is identifiable from (3) and (4), when a 1z = −γ yz = −σ yz /σ zz , while ω z 11 = σ yy·z and ω z 22 = σ zz . The second possibility is which is identifiable from (5) and (6), when a 2y = −γ zy = −σ zy /σ yy , while ω y 11 = σ yy and ω y 22 = σ zz·y . The Cholesky forms (31) and (32) are observationally equivalent.
Using the causal transmission analysis, we may find, for instance, that w → z → y in the reduced-form model. This is consistent with the first Cholesky form (31) with the additional restriction that b 1w = 0, that is: where the errors e 1 and e 2 are independent. This model is asymmetric. It shows how economic shocks in z can transmit to the structural relation y + a 1z z. Subtly, the asymmetry is captured by w rather than the errors e 1 , e 2 , which have a symmetric role. Thus, the interpretation of this structural model is that it shows how, typically, large shocks of the type w move through the economy, which is also subject to, typically, small shocks of the type e 1 , e 2 . For instance, w may represent the onset of a major economic crisis or a major government intervention, while the shocks e 1 , e 2 represent the minor, daily pulling and pushing forces in the economy. Thus, the structural assumption we need for this analysis is that w is a catalyst. The remaining features of the causal transmission w → z → y are testable and discoverable from reduced-form analysis. In Section 3.3, we extend this analysis to a situation with multiple catalysts.

Instrumental Variable Estimation
The traditional simultaneous equations model has no causal direction. Instead, the focus is to estimate the behavioral equations with the aid of instruments. We discuss this in the context of a simple demand and supply example, with a focus on the demand curve.
Consider the following demand function formulated in terms of the (log) quantity q and the (log) price level p q = a 0 + a 1 p + u d .
The demand function can be identified with the use of an instrument w that is valid E(wu d ) = 0 and informative E(wp) = 0. This corresponds to a supply shock and gives the first-stage equation implying an exclusion restriction in the demand equation. The demand function describes the linear relation between prices and quantities. The variables are jointly determined, so there is no causal direction between them. Thus, the Equation (34) can be reversed as This is reflected when estimating with limited information maximum likelihood. In that case, the product of the estimate for a 1 in Equation (34) and the estimate for 1/a 1 in Equation (36) is indeed unity. This applies both in the just-identified case where w is univariate and in the over-identified case where w is multivariate.
In general, the structural error u d in (34) and the first-stage error u p in (35) may be conditionally dependent given w in the instrumental variable problem. However, if the two errors are indeed conditionally independent, then the demand Equation (34) represents the conditional distribution of q given p, w with the property f(q|p, w) = f(q|p) so that the first condition of Theorem 1 is met. The second condition of Theorem 1 ensures the informativeness of w in the first-stage Equation (35), that is b 1 = 0. Theorem 1 then shows that the reversed demand Equation (36) cannot represent the conditional distribution of p given q, w, and there is a unique Markov structure wpq: the demand equation does not depend on w, but the conditional distribution must depend on w. In the case of normal errors, wpq implies wq by Lemma 3. With the additional assumption that w is a catalyst, we arrive at the causal transmission w → p → q.
We can also start from a reduced-form model and discover Markov structures without imposing structural assumptions. Ignoring intercepts, the starting point is the reduced-form system (1), that is where f( q , p |w) = f( q , p ) is normal. Here, w is merely a conditioning variable, albeit a candidate for an instrument. The reduced-form system implies an equation that does not depend on the exogenous w when γ pw = 0 so that the instrument is informative. The error term u in Equation (38) has the property that it is independent of the instrument w since f( q , p |w) = f( q , p ) implies f(u|w) = f(u). We note that in this just-identified setup, the ratio of least squares estimators for γ qw and γ pw is the indirect least squares estimator, which is the same as the two-stage least squares estimator or limited information maximum likelihood estimator. If the slope γ qw /γ pw is positive and if we can interpret w as a supply shock, then Equation (38) can be interpreted as a demand equation. By imposing the additional, testable restriction that u and p are independent or, equivalently, that γ qw /γ pw = Cov( q , p )/Var( p ), the unique Markov structure wpq is obtained by Theorem 1. If the instrument w can be viewed as a catalyst, it transmits causally through p to the traded quantity q. The likelihood ratio test for the hypothesis of independence of u and p can be approximated by the Hausman test for endogeneity. As a causal concept, causal transmission is modest in scope: all causal orderings are relative to particular interventions with no attempt to give an overall causal ordering of the variables of interest, y, z. The concept is more modest than the causal inference interpretation of quasi-experiments, where the difference of potential and realized outcomes is estimated using an instrumental variable approach and the causal language from random control trials is applied, see Imbens (2014). Rather than conducting causal inference under an assumption of causal transmission, we are interested in conducting inference about the causal transmission itself. In practice, the consequence is that it becomes clearer that results can only be extrapolated to future interventions insofar as those interventions are comparable with the interventions in the sample.
An empirical illustration of this instrumental variable setup is the analysis of the Fulton Fish market data by Hendry and Nielsen (2007). This uses the data collected and analyzed by Graddy (1995) and Angrist et al. (2000). For those data, q and p would be log aggregated daily quantities and prices of whiting while w is an indicator variable for the stormy/fair weather at sea where the fish is caught.

External Instruments
Montiel Olea et al. (2021) identify impulse response functions within structural vector autoregressions by finding instruments for the structural shocks. For comparison, we ignore the lagged dependent variable, which does not play any particular role in the argument, and focus on the first structural shock. The structural model is where the structural shocks e 1,t and e 2,t are assumed to have a diagonal covariance matrix Ω e = diag(ω 11 , ω 22 ). The coefficient H 12 is of interest, but it is not identifiable from the reduced-form representation. The identification strategy sought here is through an external instrument w t that satisfies E(e 1,t w t ) = α = 0 for informativeness and E(e 2,t w t ) = 0 for validity; diagonality of Ω e is required for identification of the structural shocks but not for identification of the impulse response function. The coefficient H 12 can then be found from the covariance of (y t , z t ) and w t . It is useful to express the assumptions through the joint density of f(e 1 , e 2 , w). If normality is assumed, for instance, the requirements on the instrument and the structural shocks are that This approach identifies impulse response functions but does not impose any causal ordering. An ordering would require either H 12 = 0 or H 21 = 0. The instrumental variable assumptions placed on the structural model, however, do not imply either of these situations; they are necessary but not sufficient for causal transmission of w, since no asymmetry is introduced between the endogenous variables.
The recursive ordering achieved through Cholesky decomposition with H 21 = 0 gives direction but no uniqueness as it is observationally equivalent to swapping the roles of the variables y and z. The external instrument approach with E(e 1,t w t ) = α gives uniqueness but no direction; additionally, imposing H 21 = 0 would yield a causal transmission w → y → z, but imposing H 12 = 0 does not due to E(e 2,t w t ) = 0. In both identification approaches, the restrictions introduce an asymmetry in the structural model. For example, external instruments impose the asymmetry on the joint distribution of the unobserved structural errors e 1,t , e 2,t and the instrument w t . The restrictions are sufficient to just identify the structural model from the reduced-form model that remains symmetric. It seems necessary to impose the asymmetry on the reduced-form model in order to ensure a unique direction.

Multiple Causal Transmissions
The possibility of multiple causal transmissions was explored in Section 3.3. This was performed in a reduced-form model. Here, we explore the structural interpretation.
The setup is the linear normal system (21) with two distinct catalysts. When the restrictions H 1 and H 2 defined in (26) and (27) are both satisfied, we obtain causal transmissions in opposite directions: w 1 z y w 2 . The restricted reduced-form model (29) is then Following the considerations in Section 4.1, a corresponding structural model is where γ yz = σ yz /σ zz and γ zy = σ zy /σ yy are multipliers for the catalysts, while δ 21 = (1 − ρ 2 )γ z1 and δ 12 = (1 − ρ 2 )γ y2 , with ρ 2 = σ 2 yz /(σ yy σ zz ). The innovations of the structural Equation (42) satisfy with correlation −ρ = −σ yz (1 − ρ 2 )/(σ yy·z σ zz·y ) 1/2 . We have identified a structural model with respect to catalysts w 1 and w 2 without imposing any ad hoc restrictions on the causal ordering through the covariance matrix. The catalysts are orthogonal to each other in the structural model in the sense that w 1 is omitted from the first structural equation and w 2 is omitted from the second structural equation. Structure is, therefore, identified as a linear relationship that remains invariant to large shocks. Rather than imposing structure to identify orthogonal shocks, we use shocks to identify structure. Instead of having a structural model that is ordered for an entire sample, we are only concerned with ordering during periods when large interventions take place. We note that if the parameters γ yz , γ zy of the system (42) were unrelated to the covariance parameters σ yy , σ yz , σ zz in (43), we would have a just identified and undirected, bivariate simultaneous equations model. Causal transmissions in both directions depending on the type of shock seems compatible with the discussion of shocks in macroeconomics. In many situations, we use indicator variables to represent large external shocks to the economy. When large external shocks arrive in quick succession, it may be difficult to separate the effect of the individual shocks. A pertinent example is the beginning of the financial crisis in 2007-2008 when oil shocks, financial collapse and large fiscal and monetary policy interventions occurred in quick succession. We envisage that it would be possible to disentangle the effect of these shocks by lining these up, individually, with shocks at other points in time.

Super Exogeneity
The concept of super exogeneity by Engle et al. (1983) is formulated in the context of a statistical model with density f λ 1 (y t |z t ) f λ 2 (z t ) for t = 1, . . . , T and with parameters varying in some parameter space. The parameters λ 1 , λ 2 satisfy a sequential cut property if they are variation free so that maximizing the conditional (partial) likelihood for y t given z t and the marginal (partial) likelihood for z t separately delivers the overall maximum likelihood. This idea was exploited by Fisher (1922) in a non-dynamic context. A model user may only be interested in a subset of the parameters ψ = f (λ 1 , λ 2 ). If the parameters are variation free and the parameter of interest is only a function of λ 1 , the variable z t is said to be weakly exogenous for ψ. Engle et al. (1983) proceed to say that the parameters may change over time. A conditional model is said to be structurally invariant if all its parameters are invariant to any change in the distribution of the conditioning variables. Further, z t is said to be super exogeneous for ψ if z t is weakly exogeneous for ψ and the conditional model is structurally invariant.
It is useful to contrast our theory and the notion of super exogeneity. Our theory is concerned with a single distribution rather than a statistical model, which is a parametrized family of distributions. Parameters are therefore not involved. In the examples, distributions have been expressed in terms of coefficients which are thought of as having a single value. In practical implementation, these coefficients will usually be replaced by parameters which are to be estimated. By avoiding a link between causal transmission and parameters, the sequential cut property is not essential and causal transmission can run counter to a sequential cut direction. It will also be possible to have causal transmission of different types of shocks in different directions.
Co-breaking is related to both super-exogeneity and causal transmission, see Hendry and Massmann (2007). If the variables y t , z t have level shifts, but a linear relation of the variables does not have level shifts, the variables are said to co-break. With co-breaking, the linear relation need not coincide with a conditional relation.

The Multiple Causal Transmissions Model
The most complicated construction in this paper is the multiple causal transmissions explored in Sections 3.3 and 4.4, as this involves two endogenous variables and two shocks. We draw some parallels to the graphical model literature and give some remarks on exponential family properties and, hence, on estimation.
We consider the linear, normal system (21) where f( y , z |w) = f( y , z ) is normal as in (2).

Graphical Models
Our terminology of causal transmissions leads to the following considerations concerning graphical notation. Assuming w 1 , w 2 are catalysts, we write under the conditional independence constraints along with the dependence conditions Markov properties for chain graphs take various forms in the literature, see Drton (2009). Here, we focus on the first two types described by Drton. We start with the alternative Markov property by Andersson et al. (2001), as these authors have an introductory example resembling the present situation. They operate in the joint distribution of y, z, w 1 , w 2 starting from (44) and the assumption that w 1 , w 2 are bivariately normal. They use the graph notation w 1 z y w 2 (48) to describe the situation where γ y1 = 0 and γ z2 = 0 while Cov(w 1 , w 2 ) = 0.
These constraints pertain to the parameters in the joint model for y, z in (44), rather than the parameters in the conditional distributions. Lauritzen and Wermuth (1989) and Frydenberg (1990) present a block concentration Markov property. Following Andersson et al. (2001), this would imply using the graph in (48) to describe the situation where γ y1·z = 0 and γ z2·y = 0 while Cov(w 1 , w 2 ) = 0.
The zero constraints to two regression coefficients match the constraints in (46) and produce a Markov structure. Our approach, however, requires the dependencies (47), so the graph (45) indicates the flow of catalysts through y, z under a causal assumption. By contrast, the graph (48) indicates certain Markov structures. However, our approach is silent on the distribution of the catalysts.

Exponential Family Properties
The unrestricted model (44) is known to be a regular exponential family. To see this, introduce the vector notation x = (y, z) and w = (w 1 , w 2 ) for the observations and matrix notation for the parameters so that We can then write the unrestricted density as The canonical parameter consists of Σ −1 and Σ −1 γ. With i.i.d. repetitions of y i , z i , w i , the sufficient statistic consists of ∑ n i=1 x i x i and ∑ n i=1 w i x i . The dimensions of the canonical parameter and the sufficient statistic match, so the exponential family is regular.
We note that the canonical parameter has the detailed expression The conditional independence constraints (46) set the off-diagonal elements of Σ −1 γ to zero, see also Andersson et al. (2001). The exponential family constrained by (46) therefore remains regular. The restricted density is now where θ is the vector of diagonal elements in Σ −1 γ and t = (w 1 y, w 2 z) . Thus, the dimensions of the canonical parameter and the sufficient statistic match in an i.i.d. model. In particular, the likelihood is concave with a unique maximum, see Sundberg (2019, §3.2). We note in passing that the first two constraints under the alternative Markov property model (49) correspond to setting the off-diagonal elements of γ to zero. We therefore obtain a system of seemingly unrelated regressions. The constraints amount to a nonlinear constraint on the canonical parameter Σ −1 , Σ −1 γ, resulting in a curved exponential family. The concavity property of likelihood is lost, and it may have multiple maxima, see van Garderen (1997) and Drton and Richardson (2004).

Empirical Example
We illustrate the causal transmission using the simplified bivariate model of money demand for the UK in Hendry and Nielsen (2007). This has the convenient features of being bivariate, reasonably well-specified and with two catalysts operating in opposite directions. The data are formed from quarterly observations of log M1 money m, log real total final expenditure x, its log deflator p and a constructed net interest rate R n taken from Hendry and Mizon (1993) over the period from 1963:2 to 1989:2. This in turn builds on Hendry and Ericsson (1991). To simplify the analysis, we convert the four variables into a bivariate system, modeling the velocity of circulation of money v and the cost of holding money C through v t = x t − m t + p t , C t = ∆p t + R n,t .
We show how the results from the previous sections may be applied in practice to identify multiple causal transmissions. Subsequently, we provide impulse responses for the interventions that are identified. Finally, we address the Lucas critique that asserts that an econometric model may be unstable under changing conditions. The subsequent computations were carried out in MATLAB (2014) and PcGive (Doornik and Hendry 2013).
6.1. The Unrestricted Reduced-Form Figure 1 shows v t , C t in levels and differences. The transformed data series are non-stationary, but their first-order differences have a more stationary appearance. The plots also show two dummy variables w out,t , w oil,t representing large fiscal expansions in 1972:4-1973:1 and 1979:2 as well as the oil price shocks in 1973:3-4 and 1979:3. They will later be interpreted as catalysts.  1965 1970 1975 1980 1985 1990 ∆C w oil w out Figure 1. Levels and first-differences of the variables in the system (v, C) plotted with the selected outliers (w out , w oil ).
Whereas the oil shocks are clearly exogenous to the UK economy, this is less obvious for the fiscal expansions. In fact, what we call fiscal shocks are the expansionary budget of 1972 proposed by Anthony Barber, then Chancellor of the Exchequer, and a significant VAT reduction in 1979. While both are likely endogenous to the UK economy, neither shock was set up to influence money demand, and so both can be considered exogenous for the bivariate model of (v, C). Furthermore, while the shocks are different in principle, the effect is the same so that it becomes possible to extend our conclusions over several types of shocks that are expansionary in nature.
The dummy variables are taken from Hendry and Mizon (1993). They were originally found through a residual analysis as large outliers. By including dummies for these particular observations, the remaining observations appear to match a normal reference distribution, and the model passes standard specification tests including recursive tests. At the same time, these dummies have interpretation as interventions and are in this respect related to the historical narrative approach of Romer and Romer (2010).
The initial specification is a second-order vector autoregressive model including the two dummy variables w out,t and w oil,t , that is +γ C,out w out,t + γ C,oil w oil,t + C,t .
The estimated model is the joint model reported in equilibrium-correction form in the first two columns of Table 1. The innovations v,t , C,t are assumed i.i.d. jointly normal with zero mean and independent of the current and past regressors.  Specification tests are reported in Table 2. The residual specification tests include a cumulant based test χ 2 norm for normality, a test F ar for autoregressive temporal dependence (Godfrey 1978), a test F arch for autoregressive conditional heteroscedasticity (Engle 1982), a test F het for heteroscedasticity (White 1980) and a test max Chow based on the maximum of recursive 1-step-ahead Chow (1960) forecast test statistics. We will benefit from this recursive test in Section 6.6. The above references only consider static or stationary models, but the specification tests also apply for non-stationary autoregressions, see Kilian and Demiroglu (2000) for χ 2 norm , Nielsen (2006) for F ar and Nielsen and Whitby (2015) for max Chow. We see that the specification for the velocity equation is very good, while the specification for the cost equation is less good but tolerable. The two Chow tests take their maximum values in 1971:1 and in 1976:4. These dates correspond to the decimalization of the Pound and the debt intervention by the International Monetary Fund. Overall, these tests indicate that we cannot reject the model and that the innovations are independent, identically normal.
The dummy variables play a dual role in the subsequent analysis. First, we need the dummy variables to achieve a reasonable specification of the econometric model. Without these, the residuals appear too irregular and we cannot perform valid inference. The chosen statistical model is based on the normal distribution and the observations captured by the dummy variables are outliers relative to this reference distribution. Second, the dummy variables help us to distinguish between large and small shocks. The large shocks occur infrequently, and they are often interpretable as catalysts. The above specification analysis indicates that the largest shocks after the oil crises and output expansions are the decimalization of the Pound in 1971:1 and the turmoil around the IMF intervention in 1976:4. In terms of fit, the results in Table 2 do not suggest that it is necessary to include dummies to represent these events. This could be followed up with a sensitivity analysis for the inference we draw about the oil shocks and the output expansion. For instance, does it make a difference to include a dummy for the decimalization? At the same time, we could include dummies for the decimalization and the IMF intervention to explore the transmission of those events. In other words, if we are concerned with a particular macroeconomic intervention we can to some extent search for similar interventions in the past and explore their transmission.

Causal Transmission in UK Money Demand Data
We now explore causal transmission. Table 1 reports the unrestricted reduced-form model in columns 1 and 2. This is a model for v t , C t given dummies and the past. In the estimated model (56) and (57), we have assumed that the joint density of the innovations v,t , C,t given contemporaneous and past regressors is i.i.d. zero mean jointly normal. When applying the theory of Section 3, the density f(y, z|w) will represent the estimated innovation density. Depending on the context, y, z will refer to v,t , C,t in some order, w will refer to one of the dummy variable w out,t , w oil,t and the remaining regressors are ignored.
The effect of the oil price shocks can be explored by conditioning v t on C t and follow Example 2. The conditional equation for v t given C t and the marginal equation for C t are reported in columns 3 and 2, respectively, in Table 1. The coefficient for w oil,t is insignificant in the conditional equation but significant in the marginal equation. Theorem 1 shows there exists a unique Markov structure such that v t and w oil,t are conditionally independent given C t . Further, the coefficient for ∆C t is significant in the conditional equation. Theorem 2 then shows the Markov structure is non-trivial so w oil,t -C t -v t . Lemma 3 then shows that the transmission between w oil,t and v t is non-trivial. Correspondingly, the coefficient for w oil,t is significant in the marginal v t equation. From an economic perspective, it seems reasonable to interpret the oil shocks as catalysts so that w oil,t → C t → v t . The interpretation is that large oil price shocks move the cost of holding money and in turn the velocity.
To illustrate the uniqueness result, we now consider the conditional equation for C t given v t in column four of Table 1. Here, w oil,t is significant, so we cannot have a Markov structure from w oil,t through v t to C t . This is in line with Theorem 1.
Turning to the output shock, we condition C t on v t . The conditional equation for C t given v t and the marginal equation for v t are reported in columns 4 and 1, respectively. We follow Example 2 again. The output dummy w out,t is significant in the marginal equation and insignificant in the conditional equation. Moreover, velocity, ∆v t , is significant in the conditional equation. Theorems 1 and 2 then show a non-trivial Markov structure w out,t -v t -C t . Lemma 3 shows that the transmission is non-trivial. Interpreting w out,t as a catalyst, we then have w out,t → v t → C t . Economically, large fiscal expansions may impact the velocity of money without having an impact on inflation straight away. The conclusion is, however, less clear than the causal transmission of the oil shocks. Indeed, in line with the discussion in Section 3.1.3, we check if the fiscal shock w out,t actually has a non-negligible effect on the cost of holding money. The coefficient in the C t equation has a t-statistic of 1.3, which at best shows marginal significance. Thus, we may very well have w out,t → v t → C t , but evidence for this transmission is weaker than the evidence for the transmission of the oil shocks.

Imposing Multiple Catalysts
The two causal transmissions w oil,t → C t → v t and w out,t → v t → C t can be imposed individually. These are the hypotheses H 1 , H 2 of (26) and (27). Imposing both gives w 1 z y w 2 as described in Section 3.3. This is a system of seemingly unrelated regressions. When maximizing the likelihood, we chose to parametrize it in terms of σ yy·z , σ zz·y , ρ and derive standard errors for γ yz and γ zy using the δ-method.
The restricted model is reported in columns 5 and 6 of Table 1 in the structural form derived from Section 3.3. The likelihood ratio statistic for the two restrictions is 2(559.31 − 558.74) = 1.14, which is not significant when compared to a χ 2 2 distribution. The structural estimates largely match those of the conditional models in Table 1. Writing the model in structural form, it becomes very clear that the dummies w out,t , w oil,t affect distinct linear combinations of the endogenous variables. The first structural equation is interpretable as the monetary quantity relation, showing how money demand reacts to output shocks, while the second structural equation is interpretable as a cost-push relation showing how money demand is driven by price shocks.

Cointegration
The velocity and cost of holding money variables are non-stationary and should possibly be subjected to a cointegration analysis. This is compatible with causal transmission.
Following the maximum likelihood setup of Johansen (1995), the cointegration model with rank one is given by the equilibrium-correction model The model with multiple causal transmissions and cointegration imposed has a likelihood 555.44. For present purposes, we merely consider the likelihood ratio test for the cointegration restriction within the model with multiple causal transmission imposed. The test statistic is 2(558.74 − 555.44) = 6.60, which should be compared to a 95% critical value of 9.1, see Johansen (1995, Table 15.2).
With a unit cointegration rank, the coefficients to v t−1 , C t−1 are proportional across the equations. This results in the cointegrating relation v t−1 = 6.239C t−1 , which is interpretable as long-run money demand. The adjustment coefficient in the conditional equation for v t given C t is a modest 9.5% per quarter, whereas the adjustment in the marginal equation for C t is insignificant. We note that in a model without multiple causal transmissions imposed, the constraint α C = 0 would be a hypothesis of weak exogeneity, Johansen (1995, §8), but the weak exogeneity is broken when imposing the cross-equation restrictions implied by causal transmission.

Impulse Responses
We now carry out an impulse response analysis with respect to the economic shocks represented by w out,t and w oil,t . We reconstruct empirical scenarios and compare our results to the data. Thereby, the impulse responses are associated with particular shocks at particular points in time and their trajectories can be compared with the actual development of the data. This offers a distinct advantage over impulse responses created by placing identifying restrictions on the covariance matrix. Figure 2a,b explores the period around the first oil crisis, where the fiscal expansions in 1972:4-73:1 are followed by the oil shock in 1973:3-4. Likewise, Figure 2c,d explores the period around the second oil crisis, where the fiscal expansions in 1979:2 are followed by the oil shock in 1979:3. In both cases, we provide joint impulse responses and compare these to real data over a five-year horizon in Figure 2. All joint impulses perform remarkably well compared to the scenario under consideration. What is more, the impulse response functions do not decline in performance across each scenario, indicating a temporal stability in causal transmission. This is addressed further in Section 6.6.

Lucas Critique
Major shocks such as the oil crises and fiscal expansions change the policy environment and, in turn, may influence the behavior of individual agents. It has long been a concern whether this results in instability for the parameters of an economic model, rendering it useless for analyzing the effect of implementing the policy. This is known as the Lucas (1976) critique, although the concern goes back to Frisch and Haavelmo. Engle and Hendry (1993) argue that tests of super exogeneity are of interest when seeking to address the Lucas critique. Causal transmission is relevant in a similar way. We illustrate the use of causal transmission in policy analysis by performing a recursive analysis of the money data.
Previously, w oil,t was constructed as the sum of impulse indicators across the two oil crises. Now, we construct dummies w oil1,t , w oil2,t for 1973:3-4 and 1979:3, respectively, so that w oil,t = w oil1,t + w oil2,t . We re-estimate the equations for (v t |C t ), (C t ) reported in Table 3 over subsamples 1963:4-1977:2 and 1963:4-1989:3 using the split oil dummy. It is clear that the transmission of the first catalyst w oil1 → C → v does not differ in a statistically significant way from the transmission of the second catalyst w oil2 → C → v. Deconstructing the catalyst w out provides similar evidence for the stability of the causal transmission of the output shocks. The search for causal transmission in well-specified models therefore seems relevant when considering the Lucas critique. This does, of course, go hand in hand with the fact that the model in Table 1 passes recursive specification tests, such as the max Chow test.

Concluding Remarks
Causal transmission has been introduced to capture the idea that large economic shocks may transmit gradually through the macroeconomy.
There are three ingredients to the definition of causal transmission of catalyst w through z to y. First, we need a non-trivial Markov structure w-z-y, that is, the Markov structure f(y, z|w) = f(y|z)f(z|w) needs to be non-trivial in the sense that y, z are dependent and z, w are dependent. Secondly, we need a non-trivial transmission between w, y so that w, y are dependent. Thirdly, we need a causal assumption for the catalyst w. When these conditions are satisfied, we write w → z → y. We have shown how this definition can be extended to the transmission of two unrelated catalysts.
Causal transmission is defined for general densities and it does not require normality. The first two conditions to the definition of causal transmission are testable using observational data. In standard models, the first condition of a non-trivial Markov structure implies the second condition of a non-trivial transmission. These standard models include normal models and mixtures of normal and logit/probit models.
Causal transmissions also require a catalyst. As in instrumental variable analysis, the catalyst can be found as a natural experiment formulated prior to the empirical analysis or it may be discoverable from the empirical analysis of observational data. Outlier detection algorithms such as Autometrics by Doornik (2009) may be helpful in this respect. The causal transmission relies on an economic interpretation of the catalyst where the narrative approach of Romer and Romer (2010) may prove helpful. Catalysts have to be statistically significant to be discoverable, and the evidence of causal transmission is stronger if found in several instances. In these ways, the empirical analysis of causal transmission is consistent with the criteria for causation in Hill (1965).
The present analysis was inspired by Bårdsen et al. (2017), who construct 3-year ahead quarterly forecasts from March 2007 generated from their macro-econometric model for Norway. In 2008, policymakers in Norway and abroad changed the policy rate dramatically in response to the financial crisis, creating a large shift of the short-term interest rate. It appears that this had the causal impact of offsetting potential big shifts in the labor market in such a way that the macro-econometric model produces good forecasts of unit labor cost, inflation and unemployment despite the financial crisis. It is plausible that the effects seen in the forecasts of the Norwegian macro-econometric models of Bårdsen et al. (2017) could be described as a combination of a major financial shock and a subsequent policy reaction calibrated to offset the financial shock in the labor market.
It would be interesting to develop the ideas on causal transmission in larger statistical models of the economy. First, to what extent does a causal transmission analysis of f(y, z|w) extend to f(x, y, z|w)? Second, where interventions can be identified as catalysts that induce a particular dependence structure or causal transmission among modeled variables, these can be deployed out of sample to attenuate adverse shocks by targeting dependence chains, as opposed to single variables.

Author Contributions:
The authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript. Proof of Theorem 1. Condition (12) shows f(y|z, w) = f(y|z). Thus, f(y, z|w) = f(y|z, w)f(z|w) = f(y|z)f(z|w). (A3) First, rearrange to obtain f(y|z, w) = f(y, z|w)/f(z|w) = f(y|z).
Now, apply Lemma A1. The first statement on the left of (A1) holds through (A4), while the right hand side fails through (A5). Thus, the second statement on the left hand side of (A1) fails as desired.

Proof of Lemma 2.
It is assumed that wzy so that f(y, z|w) = f(y|z)f(z|w) with f(y|z) = f(y) and f(z|w) = f(z). It has to be argued that f(y|w) = f(y). We prove by contradiction and show that f(y|w) = f(y) implies that f(y|z) = f(y) or f(z|w) = f(z).