Next Article in Journal
Superheating Control of ORC Systems via Minimum (h,φ)-Entropy Control
Next Article in Special Issue
The Paradox of Time in Dynamic Causal Systems
Previous Article in Journal
Discriminating Bacterial Infection from Other Causes of Fever Using Body Temperature Entropy Analysis
Previous Article in Special Issue
Normalized Augmented Inverse Probability Weighting with Neural Network Predictions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Testability of Instrumental Variables in Linear Non-Gaussian Acyclic Causal Models

1
School of Mathematical Sciences, Peking University, Beijing 100871, China
2
School of Mathematics and Statistics, Beijing Technology and Business University, Beijing 100048, China
3
School of Computer, Guangdong University of Technology, Guangzhou 510006, China
4
Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA 15213, USA
5
Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi 7909, United Arab Emirates
*
Author to whom correspondence should be addressed.
Entropy 2022, 24(4), 512; https://doi.org/10.3390/e24040512
Submission received: 5 March 2022 / Revised: 1 April 2022 / Accepted: 3 April 2022 / Published: 5 April 2022
(This article belongs to the Special Issue Causal Inference for Heterogeneous Data and Information Theory)

Abstract

:
This paper investigates the problem of selecting instrumental variables relative to a target causal influence X Y from observational data generated by linear non-Gaussian acyclic causal models in the presence of unmeasured confounders. We propose a necessary condition for detecting variables that cannot serve as instrumental variables. Unlike many existing conditions for continuous variables, i.e., that at least two or more valid instrumental variables are present in the system, our condition is designed with a single instrumental variable. We then characterize the graphical implications of our condition in linear non-Gaussian acyclic causal models. Given that the existing graphical criteria for the instrument validity are not directly testable given observational data, we further show whether and how such graphical criteria can be checked by exploiting our condition. Finally, we develop a method to select the set of candidate instrumental variables given observational data. Experimental results on both synthetic and real-world data show the effectiveness of the proposed method.

1. Introduction

Estimating causal effects from observational data is an important problem, especially in the presence of unmeasured confounding. The instrumental variable (IV or instrument) model is a general approach to estimate causal effect in the presence of unobserved variables [1,2,3,4] and is used in a wide range of literature, such as economics [5,6], sociology [4,7], and epidemiology [8,9].
A major challenging problem in an instrumental variable model is how to select a valid IV to infer the causal effect of one variable X on another variable Y. In general, IVs need to be chosen based on domain knowledge or expert experience. However, it is sometimes difficult to select a valid IV without precise prior knowledge of causal structure, and an invalid IV may cause a biased estimation of the effect of X on Y [10]. Therefore, it is desirable to investigate ways of selecting IVs only from observed variables.
Although it is not possible to test whether a variable is a valid IV only from the joint distribution of observed variables, there exist several methods for testing whether a variable of interest is an invalid IV. Pearl [11] provided a necessary condition, called the instrumental inequality,for a general instrument model, which can be used to test whether a variable is a candidate IV for discrete variables. Inspired by instrumental inequality, various contributions were made towards discovering the testability of IV validity in different scenarios [12,13,14,15]. More recently, Kédagni and Mourifié [16] considered a more general case where treatment is discrete and there are no restrictions on IV and outcome and proposed generalized instrumental inequalities to test the IV independence assumption. However, those approaches fail to work when treatment is a continuous variable. Pearl [11] conjectured that instrument validity cannot be tested in the case where treatment is a continuous variable without any further assumption, which was recently proved by Gunsilius [17].
There exist works in the literature that address the continuous variable setting. Kuroki and Cai [18] utilized vanishing Tetrad conditions [19] and proposed a new necessary condition to solve this problem in the linear structural causal model. However, their method needs at least three valid IVs in the observed variables. Kang et al. [20] proposed the sisVIVE algorithm to estimate the causal effect in the case where more than half of the variables are valid IVs in the observed variables. Later, Silva and Shimizu [21] appear to be the first to exploit the non-Gaussianity property in the linear structural causal model. They utilized the generalized Tetrad conditions (t-separation) [22,23] and designed a IV-TETRAD algorithm to select IVs. Unfortunately, their conditions still require two or more IVs as a prerequisite for instrument testing and may rule out some correct IVs. For instance, consider the causal graph in Figure 1. Assume the causal relationships between variables are linear and that the noise terms follow non-Gaussian distributions. Then, the IV-TETRAD returns an empty set of candidate IVs though Z is a valid IV relative to X Y .
In this paper, we show that, for continuous data, a single variable Z being a valid IV relative to X Y imposes certain constraints in a linear non-Gaussian acyclic causal model. Specifically, we make the following contributions:
1.
We propose a necessary condition for detecting variables that cannot serve as (conditional) IVs by the so-called generalized independent noise (GIN) condition [24], which is called instrumental variable generalized independent noise (IV-GIN) condition. We characterize the graphical implications of IV-GIN condition in linear non-Gaussian acyclic causal models.
2.
We then further show whether and how the graphical criteria of an instrumental variable can be checked by exploiting the IV-GIN conditions.
3.
We develop a method to select the set of candidate IVs for the target causal influence X Y from the observational data by IV-GIN conditions.
4.
We demonstrate the efficacy of our algorithm on both synthetic and real-word data.

2. Related Work

In this section, we review some of the key works that are most closely related to ours.

2.1. Instrument Variable Models

The instrumental variable (IV) model is a general approach to estimate the causal effect of a treatment X on an outcome Y of interest in presence of unobserved variables [1,2,3]. That is to say, the IV model is an unbiased estimator of the causal effect of X on Y of interest [4,6]. In practice, one can obtain IVs based on domain knowledge or expert experience. However, it is sometimes difficult to select the valid IV without precise prior knowledge of causal structure, and an invalid IV may cause a biased estimation of the effect of X on Y [10]. In this paper, we investigate data-driven ways of selecting IVs only from observed variables. The current methods for selecting IVs can be roughly divided into the following two settings.
In the literature of the discrete variable setting, Pearl [11] provided a necessary condition, called instrumental inequality, which can be used to test whether a variable is an invalid IV. Inspired by instrumental inequality, various contributions were made to discover IV validity’s testability in different scenarios. For instance, Manski [12] showed the same instrumental inequality in the missing data model. Palmer et al. [13] and Wang et al. [15] considered useful tests of the instrumental inequality in the binary instrumental variable model. Kitagawa [14] introduced another test of the instrument in the case where the outcome is continuous. More recently, Kédagni and Mourifié [16] proposed generalized instrumental inequalities to test the IV independence assumption in the case where treatment is discrete and there are no restrictions on IV and outcome. Gunsilius [17] recently proved the Pearl’s conjecture that instrument validity cannot be tested in the case where treatment is a continuous variable without any further assumption [11].
There exist works in the literature that address the continuous variable setting. For instance, Kuroki and Cai [18] proposed a new necessary condition to resolve this problem in the linear structural causal model using the so-called Tetrad conditions [19]. Later, Kang et al. [20] proposed the sisVIVE algorithm to estimate the causal effect in the case where more than half of the candidate instruments are valid (majority rule). Recently, Silva and Shimizu [21] appear to be the first to exploit the non-Gaussianity property in the linear structural causal model. They designed an IV-TETRAD algorithm to select IVs using the generalized Tetrad conditions (t-separation) [22,23]. Unfortunately, the above methods require two or more IVs as a prerequisite for instrument testing, and some methods (e.g., IV-TETRAD approach) may rule out some correct IVs.
Our work focuses on the continuous setting. Unlike the existing works, we show that a single variable Z, being a valid IV relative to X Y , imposes certain constraints in a linear non-Gaussian acyclic causal model.

2.2. Causal Graphical Models

Graphical models with latent variables are extensively studied in the literature. Unlike the existing methods of learning the undirected graphical model [25,26,27,28,29,30,31,32,33], here, we focus only on the most closely related work on causal graphical models, i.e., a directed acyclic graph (DAG) G representing the relations of causation among the variables [4,7]. Within the space of discovering a causal graphical model on observed data, the commonly used strategies are as follows.
One typical strategy for handling this problem is using conditional independence tests to learn the causal graph over the observed variables [4,7]. Well-known algorithms along this line include Fast Causal Inference (FCI) [34], Really Fast Causal Inference (RFCI)  [35], and their variants [36]. These methods learn the equivalence class of maximal ancestral graphs (MAGs), as represented by PAG (partial ancestral graph). However, these works focus on estimating the causal structure over only observed variables and can not recover the precise causal graph. In our work, we try to discover the set of candidate IVs from observational variables without prior knowledge of causal graphs.
Another strategy is functional causal model-based approaches. For instance, Hoyer et al. [37] showed that the causal order between any two observed variables is identifiable in the linear non-Gaussian causal model. Later, more efficient methods were proposed to learn the causal graph over observed variables  [38,39]. Recently, Salehkaleybar et al. [40] showed that the set of all possible causal effects between any two observed variables is identifiable in the same setting. Unfortunately, the size of the equivalence class of the identified causal effects could be very large, and their method requires specifying the number of latent variables a priori [21].
There is also an interesting strategy based on the “Sparse plus Low Rank Matrix Decomposition”. Many methods are proposed to address the challenge of learning a latent Gaussian graph model. For instance, Chandrasekaran et al. [26] formulated a convex objective involving nuclear norm penalization maximum likelihood for Gaussian graphical model estimation with a few latent confounders. Zorzi and Sepulchre [28] presented a two-step procedure for estimating autoregressive (AR) latent variable graphical models. Later, Ciccone et al. [41] reformulated this decomposition problem for the setting where only the sample covariance is available, and the difference between the sample covariance and the actual one is non-negligible. Alpago et al. [42] proposed an identification procedure for a sparse graphical model associated with a reciprocal process. However, these methods focus on the undirected graphical model. In the field of a causal graphical model, Frot et al. [43] introduced the LRpSC+GES algorithm to learn the causal structure with some hidden variables. Agrawal et al. [44] proposed a practical algorithm, the DeCAMFounder, to consistently estimate causal relationships in the nonlinear, pervasive confounding setting. Although these methods are used in a range of fields, they usually assume that the underlying graph among the observed variables is sparse, and there are a few hidden variables that have a direct effect on many of the observed variables. The modeling of our paper does not restrict those assumptions and allows arbitrary hidden structures.
In summary, unlike the existing methods of recovering causal graphical models, our goal is to select the set of candidate IVs from observational variables without precise prior knowledge of causal graph.

3. Preliminaries

3.1. Notation and Graph Terminology

We follow the notational conventions used in  [7]. Let G be a directed acyclic graph (DAG) with the nodes (or vertex) set V and the directed edges set E . Here, we use “variable” and “node” interchangeably. A path is a sequence of nodes { V 1 , , V r } such that V i and V i + 1 are adjacent in G, where 1 i < r . Furthermore, if the edge between V i and V i + 1 has its arrow pointing to V i + 1 for i = 1 , 2 , , r 1 , we say that the path is directed from V 1 to V r . A collider on a path { V 1 , , V p } is a node V i , 1 < i < p , such that V i 1 and V i + 1 are parents of V i . We say a path is active if this path can be traced without traversing a collider. A trek between V i and V j is a path that does not contain any colliders in G. The set of all parents and children of V i are denoted by Pa ( V i ) and Ch ( V i ) , respectively. Besides, for a set O , | O | denotes the number of elements of set O . Other commonly used concepts in graphical models, such as d-separation, can be found in  [4,7].

3.2. Instrumental Variable Model

Here, we follow the notational conventions and definitions used in [45]. Let X be the treatment (exposure), Y be the outcome, and U be the set of unmeasured confounders between X and Y.
Definition 1
((Conditional) Instrumental Variable Criteria). Given the causal graph G, a variable Z is a (conditional) instrumental variable to a target causal effect X Y given W , if and only if it satisfies the following conditions:
1. 
W contains only nondescendants of Y in G;
2. 
W d-separates Z from Y in the graph obtained by removing the edge X Y from G;
3. 
W does not d-separates Z from X in G.
For simplicity, we call these three conditions instrument criteria.
Definition 2
(IV Estimator). Suppose variable Z is a (conditional) IV for X Y given W , the causal effect of X on Y, denoted by b Y X , is identified in a linear model and given by
b Y X = σ Z Y · W σ Z X · W ,
where σ Z Y · W denotes the partial covariance between Z and Y given the set W , and σ Z X · W denotes the partial covariance between Z and X given the set W .
Figure 2 illustrates a simple instrumental variable model, where Z is an IV conditioning on { W 1 , W 2 } for the relation X Y . The causal effect b Y X is σ Z Y · { W 1 , W 2 } σ Z X · { W 1 , W 2 } .

3.3. Problem Setup

In this paper, we assume that the system of interest is a linear non-Gaussian acyclic causal model with variables in V = { X , Y } U O , where X is the treatment, Y is the outcome, U is the set of unmeasured (latent or hidden) variables, and O is the set of other measured variables. In particular, without loss of generality, we assume that all variables in V have a zero mean. Each variable V i V is generated according to the following linear structural equation model (SEM):
V i = V j Pa ( V i ) b i j V j + ε V i
where b i j is the causal strength from V j to V i . All noise terms ε V i are continuous random variables following non-Gaussian distributions with nonzero variances and are independent of each other. We restrict our attention to the recursive model [46]. That is to say, the causal relationships among variables can be represented by a DAG [4,7]. This model is also known as linear, non-Gaussian, acyclic model (LiNGAM) when all variables in V are observed [47].
Our problem of interest is to study the testability of IV validity for the relation X Y in a linear non-Gaussian acyclic causal model. To this end, theoretically, we need to investigate the testability of instrument criteria from observational variables.

4. Necessary Condition for Instrumental Variable

In this section, we first give a simple example to show that a valid IV imposes some constraints with the help of non-Gaussianity. Then, we give our necessary condition for (conditional) IVs by using generalized independent noise (GIN) conditions [24]. Finally, we present the graphical implications of the proposed condition in linear non-Gaussian causal models. To improve readability, we defer all proofs to the Appendix A.

4.1. A Motivating Example

Before showing the theoretical results, let us look at two simple graphs shown in Figure 3. Suppose the generating mechanisms of two subgraphs are as follows:
  • Subgraph (a): U 1 = ε U 1 , Z = ε Z , X = 2 Z + 0.5 U 1 + ε X , and Y = 1 X + 2 U 1 + ε Y ;
  • Subgraph (b): U 1 = ε U 1 , Z = 1 U 1 + ε Z , X = 2 Z + 0.5 U 1 + ε X , and Y = 1 X + 2 U 1 + ε Y .
Here, we consider two cases, namely Gaussian and uniform cases:
  • Gaussian Case: All noise terms in subgraphs (a) and (b) are generated from the standard Gaussian distributions.
  • Uniform Case: All noise terms in subgraphs (a) and (b) are generated from the uniform distributions over the interval [ 0 , 1 ] .
Let Y σ Y Z σ X Z X be the surrogate-variable of { Y , X } relative to Z. Figure 4 shows the scatter plots of Z and Y σ Y Z σ X Z X for two cases. Interestingly, in the Gaussian case, we find that no matter whether Z is an IV or not, Z and Y σ Y Z σ X Z X are statistically independent, while in the uniform case, Z and Y σ Y Z σ X Z X are statistically dependent if Z is an invalid IV. These observations imply that the non-Gaussianity (as indicated by the uniform distribution) is beneficial to find out whether a continuous variable is a candidate IV relative to X Y .

4.2. IV-GIN Condition for Instrumental Variable

Below, we give mathematical characterizations of the above observation by using the GIN condition. Before that, we first review the GIN condition formulated by  Xie et al. [24] and the Darmois–Skitovitch theorem that characterizes the independence of two linear statistics given in [48].
Definition 3
(GIN condition). Let P and Q be two observed random vectors. Suppose the variables follow the linear non-Gaussian acyclic causal model. Define the surrogate-variable of P relative to Q as E P | | Q ω P , where ω satisfies ω E [ P Q ] = 0 and ω 0 . We say that ( Q , P ) follows the GIN condition if and only if E P | | Q is statistically independent from Q .
Theorem 1 (Darmois–Skitovitch Theorem).
Define two random variables V 1 and V 2 as linear combinations of independent random variables n 1 , , n p :
V 1 = i = 1 p α i n i , V 2 = i = 1 q β i n i ,
where the α i , β i are constant coefficients. If V 1 and V 2 are independent, then the random variables n j for which α j β j 0 are Gaussian.
The above theorem states that if there exists a non-Gaussian n j for which α j β j 0 , V 1 and V 2 are dependent.
We now give the necessary condition of valid IVs by using GIN conditions.
Theorem 2
(Necessary Condition for IV). Let G be a linear non-Gaussian acyclic causal model. Let treatment X, outcome Y, Z, and W be correlated random variables in G. Assume faithfulness holds. If Z is a valid IV conditioning on W relative to X Y in G, then ( { Z , W } , { X , Y , W } ) follows the GIN condition.
We term this necessary condition the IV-GIN (instrumental variable-generalized independent noise) condition. For the rest of the paper, we say that [ Z | | W ] follows the IV-GIN condition relative to X Y if and only if ( { Z , W } , { X , Y , W } ) follows the GIN condition. Theorem 2 indicates that one may test whether a variable Z is an invalid IV conditioning on W relative to X Y by just testing the IV-GIN condition.
Example 1
(Motivating example, continued). Let us continue to consider the two causal graphs in Figure 3. Assume that all noise terms follow non-Gaussian distributions. According to the linear generating mechanism and IV-GIN condition, for subgraph (a),
Z = ε Z
E { Y , X } | | Z = Y σ Y Z σ X Z X = 2 U 1 + ε y .
We find that there is no common non-Gaussian independent component shared by E { Y , X } | | Z and Z. Thus, we have E { Y , X } | | Z as independent from Z due to the Darmois–Skitovitch Theorem.
However, for subgraph (b),
Z = ε U 1 + ε Z
E { Y , X } | | Z = Y σ Y Z σ X Z X = ( 2 2.5t ) U 1 + ε y 2 t ε Z t ε X ,
where t = 2 Var ( ε U 1 ) 2.5Var ( ε U 1 ) + 2 Var ( ε Z ) . We find that there is one common, non-Gaussian independent component shared by E { Y , X } | | Z and Z, i.e., ε Z because 2 t 0 . Thus, we have E { Y , X } | | Z and Z as dependent due to the Darmois–Skitovitch theorem. These facts theoretically verify the results shown in Figure 4.

4.3. Graphical Implications of IV-GIN Condition in Linear non-Gaussian causal Models

In this section, we characterize the graphical implications of the IV-GIN condition in linear non-Gaussian causal models. The following theorem shows the connection between IV-GIN condition and the graphical properties of the variables, and an illustrative example is given accordingly.
Theorem 3.
Suppose all variables V follow the linear non-Gaussian acyclic causal model and that faithfulness holds. Let treatment X, outcome Y, Z, and W be correlated random variables in V . Then, [ Z | | W ] follows the IV-GIN condition relative to X Y and there is no proper subset W ˜ of W such that [ Z | | W ˜ ] follows the IV-GIN condition relative to X Y if and only if the following three conditions hold:
1. 
There exists a node C V , C W , such that for every trek π between a node V p { X , Y , W } and a node V q { Z , W } , (a) π goes through at least one node in { C , W } , denoted by V k , and (b) V k has its arrow pointing to V p in π. (In other words, V k is causally earlier (according to the causal order) than V p on π.)
2. 
There is at least one directed path between any one node in { C , W } and any one node in { X , Y } .
3. 
There is no proper subset W ˜ of W to satisfy conditions 1 and 2.
Example 2.
Consider the causal graphs shown in Figure 3 again. For subgraph (a), there exists a node X, and W = such that (1) every trek between Z and { X , Y } , e.g., Z X Y , goes through X and that (2) X has its arrow pointing to Y. Besides, there is at least one directed path between X and any one node in { X , Y } . According to Theorem 3, we know that [ Z | | ] follows the IV-GIN condition relative to X Y in subgraph (a). However, for subgraph (b), we can not find a node C such that every trek between { Z } and a node in { X , Y } goes through C and C is causally earlier than { X , Y } , e.g., treks Z X and Z U 1 Y . This implies that [ Z | | ] violates the IV-GIN condition in subgraph (b) according to Theorem 3.

5. Testability of Instrument Criteria Validity in Terms of IV-GIN Conditions

In this section, we investigate the testability of instrument criteria by exploiting our IV-GIN condition. Note that the last condition of instrument criteria, i.e., that W does not d-separate Z from X in G, can be easily checked by the d-separation criterion because W , Z, and X are observed variables [4]. Therefore, we focus next on the first two conditions of instrument criteria.

5.1. Condition 1 of Instrument Criteria

Below, we first show that the first condition, i.e., that W contains only nondescendants of Y in G, is testable by using IV-GIN conditions.
Proposition 1.
Let G be a linear non-Gaussian acyclic causal model. Let treatment X, outcome Y, Z, and W be correlated random variables in G. Assume faithfulness holds, conditions 2 3 of instrument criteria hold, and there is no proper subset W ˜ of W such that [ Z | | W ˜ ] follows the IV-GIN condition. If { Z , W } contains at least one descendant of Y in G, then [ Z | | W ] must violate the IV-GIN condition.
Proposition 1 ensures that the IV-GIN condition rules out the invalid IVs that do not satisfy condition 1 of instrument criteria, and an illustrative example is given in Example 3.
Example 3.
Let us consider the causal graph in Figure 5. We find that [ Z | | W 1 ] follows the IV-GIN condition because Z is a valid IV conditioning on W 1 . However, we find that [ Z | | W 2 ] violates the IV-GIN condition because W 2 is the descendant of Y.

5.2. Condition 2 of Instrument Criteria

Now, we study the second condition, i.e., that W d-separates Z from Y in the graph obtained by removing the edge X Y from G. Given the conditional set W , the condition 2 can be phrased as follows:
2a.
There is no active nondirected path between Z and Y that does not include X;
2b.
There is no active directed path from Z to Y that does not include X.
In the remainder of this subsection, we discuss these two subconditions separately.

5.2.1. Subcondition 2a

It was shown that one can verify the validity of condition 2a in the case where at least two IVs are present in the ground-truth graph [21]. However, their condition is too restricted and rules out some valid IVs. (A similar conclusion is reported in Proposition 17 of [21].) Figure 1 shows an example that their method outputs an empty set of candidate IVs, though Z is a valid IV. In contrast, our IV-GIN condition is relatively mild and is able to avoid ruling out the valid IVs. Although one might not fully verify the validity of condition 2a using the IV-GIN condition, most invalid IVs that do not satisfy condition 2a are ruled out, as shown in the following theorem.
Proposition 2.
Let G be a linear non-Gaussian acyclic causal model. Let treatment X, outcome Y, Z, and W be correlated random variables in G. Assume faithfulness holds, conditions 1 and 3 of instrument criteria hold, and there is no proper subset W ˜ of W such that [ Z | | W ˜ ] follows the IV-GIN condition. Furthermore, given W , assume there is at least one active nondirected path between Z and Y that does not include X. If given W , there is no node C V such that all active paths between Z and Y go through C and C has its arrow pointing to Y, then [ Z | | W ] must violate the IV-GIN condition.
Below, we give an example to illustrate Proposition 2.
Example 4.
Consider the causal diagram shown in Figure 6. Given W 1 , there is one active nondirected path between Z and Y, i.e., Z U 2 Y , and all active paths between Z and Y are Z X Y , and Z U 2 Y . Thus, we can not find a node C such that all active paths between Z and Y go through C, and C has its arrow pointing to Y. This fact implies that [ Z | | W 1 ] violates the IV-GIN condition. That is to say, Z is an invalid IV conditioning on W 1 relative to X Y .
Now, we give a simple example to show that though the IV-GIN condition holds, the condition 2a of instrument criteria is violated.
Example 5.
Consider the causal diagram shown in Figure 7. We can find a node U 2 such that all active paths between Z and Y go through U 2 and U 2 has its arrow pointing to Y. This implies that [ Z | | ] follows the IV-GIN condition according to Proposition 2. This example tells us the IV-GIN condition is necessary, but not sufficient, to test condition 2a.

5.2.2. Subcondition 2b

We now show that it is hard to verify the validity of condition 2b, even under the non-Gaussian assumption, through the following simple example.
Let us look at the following graph in Figure 8, where Z is a invalid IV conditioning on an empty set relative to X Y .
Suppose the generating mechanism of the graph is as follows:
U 1 = ε U 1 , Z = ε Z ,
X = α Z + γ U 1 + ε X
Y = β X + δ U 1 + λ Z + ε Y
According to the definition of GIN condition, we have
E { Y , X } | | Z = Y σ Y Z σ X Z X
= ( δ λ / α ) U 1 ( λ / α ) ε x + ε Y ) ,
Based on the above equation, the component of ε Z is successfully removed from E { Y , X } | | Z although Y is generated by { Z , X , U 1 } . This implies that E { Y , X } | | Z is independent from Z according to the Darmois–Skitovitch theorem. That is to say, [ Z | | W 1 ] follows the IV-GIN condition whatever the value of λ (note that there is no directed edge between Z and Y when λ = 0 ).

6. Algorithm for Selecting the Candidate IVs

In this section, we leverage the above results and propose a sequential algorithm to select the set of candidate IVs for the target relationship X Y without prior knowledge of the causal structure. Notice that the validity of a variable as an IV is dependent on which set W we condition on. To identify candidate IV efficiently, given an observed variable Z i , we start with finding IV with an empty conditional set and then increase the number of conditional variables until the IV-GIN condition is satisfied or the length of conditional set equals | O | 1 (Lines 2∼14 of Algorithm 1). The details of the above process are given in Algorithm 1.
Algorithm 1: IV-GIN
 Input: Treatment X, outcome Y, and set of observed variables O .
 Output: Set of candidate C and its corresponding conditional set Conset .
  1: Initialize the set of candidate IVs: C = , the conditional set: Conset = , the length of conditional set: ConsetLen = 0 , and Tag = O ;
  2: while ConsetLen < | Tag | do
  3:    for each variable Z i C  do
  4:     repeat
  5:         Select a subset W from O Z i such that W = ConsetLen ;
  6:         if  [ Z | | W ] follows the IV-GIN condition then
  7:           Add Z i into C , and delete Z i from Tag ;
  8:           Set Conset ( Z i ) = W ;
  9:           Break the repeat loop of line 4;
  10:         end if
  11:      until all subsets with length ConsetLen in O \ Z i are selected;
  12:    end for
  13:     ConsetLen = ConsetLen + 1 ;
  14: end while
  15: Return: C and Conset
In practice, the main issue is how to test IV-GIN conditions, i.e., for any two sets of variables P and Q , we need to test the independence between E P | | Q and Q . To do so, we check for pairwise independence with Fisher’s method [49] instead of testing for the independence between E P | | Q and Q directly. In particular, denote by p k , with k = 1 , 2 , , | Q | , all resulting p-values from pairwise independence between variables use the Hilbert–Schmidt independence criterion (HSIC)-based independence tests [50] due to the non-Gaussianity of the data. We compute the test statistic as 2 k = 1 | Q | log p k , which follows the chi-square distribution with 2 | Q | degrees of freedom when all the pairs are independent.
Theorem 4
(Completeness of IV-GIN). Suppose that the data V = { X , Y } U O strictly follows the linear non-Gaussian acyclic causal model, that is, all the model assumptions are met, and the sample size is infinite. Furthermore, assume that there exists at least one valid IV Z conditioning on W for the relation X Y , where Z W V . Then, the output C of IV-GIN method must contain all valid IVs.

7. Experiments on Synthetic Data

In this section, we evaluate the IV selection performance on synthetic data and demonstrate the correctness of proposed theories.
Comparisons: We make comparisons with two state-of-the-art methods: the sisVIVE algorithm [20] that needs more than half of the variables to be valid IVs, and the IV-TETRAD algorithm [21] that needs two or more variables to be valid IVs. (Here, we adopt the two functions, TestTetrad and TestResiuals, to select IVs in the IV-TETRAD algorithm.) The source codes of sisVIVE and IV-TETRAD are available from https://mirrors.sjtug.sjtu.edu.cn/cran/web/packages/sisVIVE/index.html (accessed on 20 January =2022) and http://www.homepages.ucl.ac.uk/~ucgtrbd/code/iv_discovery/ (accessed on 20 January 2022), respectively.
Scenarios: We designed three scenarios, as shown in Figure 9, where X is treatment, Y is outcome, the variables U i ( i = 1 , 2 ) are unobserved, and Z j ( j = 1 , , 4 ) are potential IVs. For scenarios S 1 and S 2 , nodes Z 2 and Z 3 both are valid IVs conditioning on an empty set relative to X Y , and node Z 1 is an invalid IV due to the path Z 1 U 1 Y . The key difference between scenarios S 1 and S 2 is that there is an active nondirected path between Z 3 and X in S 2 while not in S 1 . For scenario S 3 , Z 1 is a valid IV conditioning on Z 3 relative to X Y , Z 2 is a valid IV conditioning on an empty set relative to X Y , Z 3 is an invalid IV due to the paths Z 3 Y and Z 3 U 1 Y , and Z 4 is an invalid IV due to the path X Z 4 Y .
Metrics: To evaluate the accuracy of the selected IVs, we used the following two metrics:
  • Correct-selecting rate: The number of correctly selected valid IVs divided by the total number of valid IVs in the ground-truth graph.
  • Selection commission: The number of falsely detected IVs divided by the total number of selected IVs in the output C of the current algorithm.
Experimental setup: We generated data by a linear non-Gaussian causal acyclic model according to the above three scenarios. In detail, the causal strength b i j was generated uniformly in [ 2 , 0.5] [ 0.5, 2 ] and the non-Gaussian noise terms were generated from exponential distributions to the second power. Here, we conducted experiments with the following tasks:
T1.
Sensitivity on the effect of sample size. We considered different sample sizes N = 1 k , 3 k , 5 k , where k = 1000.
T2.
Sensitivity on the effect of unmeasured confounders between X and Y. The coefficients between { X , Y } and U 1 are set such that b X U 1 = b Y U 1 = λ , at two levels, ( 0.125, 0.25) , as that in [21]. The sample size N is 5000.
We used HSIC-based independence tests [50] for the IV-GIN condition due to the non-Gaussianity of the data. Each experiment was repeated 50 times with randomly generated data, and the results were averaged.
Results on Task T1: The experimental results are reported in Table 1. From the table, we can see that our proposed IV-GIN outperforms other methods with both evaluation metrics in all there scenarios and in all sample sizes, indicating that our IV-GIN condition’s testability is wider than other algorithms’ in the linear non-Gaussian causal models. We found that the IV-TETRAD algorithm does not perform well, especially in scenarios S 2 and S 3 , indicating that it is not capable when there is an active nondirected path between valid IV and treatment X (scenario S 2 ) and a single IV is present (scenario S 3 ). We further noticed that the sisVIVE algorithm does not perform well in scenario S 3 . This is because fewer than half of the variables are valid IV conditioning on the same set in scenario S 3 .
Results on Task T2: The experimental results are reported in Table 2. It is worth noting that stronger confounding makes it more difficult to select valid IVs. From the table, we found IV-GIN gives better performances than other methods with different confounding coefficients in almost all scenarios, indicating that our IV-GIN condition is more efficient than other algorithms. We noticed that although the Correct-selecting rate of sisVIVE is higher than IV-GIN in scenario S 1 when λ = 0.25 , the selection commission of IV-GIN is lower than sisVIVE (lower is better for selection commission).
To conclude, these above findings show a clear advantage of our method over the compared algorithms.

8. Application to Vitamin D Data

In this section, we apply our algorithm to the Vitamin D data set described by Skaaby et al. [51], where the data we analyze are the population-based study Monica10. The data we use are collected from 2571 individuals between 40–71 years, as reported in [52]. In detail, these data contain 5 variables, including treatment Vitamin D status (continuous variable), outcome mortality, filaggrin genotype, age, and time (follow-up time). As argued by Martinussen et al. [52], unmeasured confounding may arise between Vitamin D status and mortality due to behavioral and environmental factors. To estimate the causal effect of Vitamin D status on mortality, one may use the filaggrin genotype as instrumental variable, as reported by Martinussen et al. [52]. In our setup, the problem of interest is to verify that filaggrin genotype is a valid IV while age and time are not without the prior knowledge of causal structure.
Here, we also make comparisons with the sisVIVE algorithm and the IV-TETRAD algorithm. In the implementation, the significance level of all methods were set to 0.01. We have the following findings: (1) The output of IV-GIN is that filaggrin genotype is a valid IV while age and time are invalid, which indicates the effectiveness of our method. (2) The output of IV-TETRAD is an empty set. This is because there is only one valid IV, which violates the basic assumption (two or more variables are valid IVs in the system). (3) The output of sisVIVE is that age is a valid IV while filaggrin genotype and time are invalid. This implies that sisVIVE fails to find the valid IV, i.e., filaggrin genotype. One reason is that fewer than half of the variables are valid IVs in this dataset. These results again indicate that our algorithm has better performance than the other algorithms for selecting valid IVs.

9. Discussion

The preceding sections presented how to use IV-GIN conditions to select the set of candidate IVs relative a target causal influence X Y from observed variables without prior knowledge of causal structure. In this section, we discuss the following two practical questions.
Is it possible to select IVs by learning the whole causal graph? In fact, it is challenging to discover the precise causal graph in the presence of arbitrary hidden variables. To show this fact, we apply the LRpSC+GES algorithm introduced by [43] to learn the diagrams of three scenarios in Section 7, respectively. For simplicity, we set sample size N = 5k. We identify the IVs according to the instrument criteria given the learned graph. In detail, if there is a direct edge between candidate variables Z and treatment X and there is no direct edge between candidate variables Z and outcome Y, we think variable Z is a candidate IV. (Note that this selection is relatively loose and not rigorous.) The results are given in the following Table 3. From the table, we can see that the correct-selecting rate is close to 0.1, which indicates that almost all valid IVs have been incorrectly removed from the candidate set of IVs. We note that the selection commissions are small in the three scenarios. The reason is that in most cases, a valid IV Z has a direct edge to both treatment X and outcome Y in the learned graph by LRpSC+GES algorithm. These findings show that given the learned graph by the LRpSC+GES algorithm, one can not correctly select the set of candidate IVs.
What happens if we have no background knowledge about X Y ? Theoretically speaking, the IV-GIN algorithm does not need to restrict the relation between X and Y, and the output C of the IV-GIN algorithm contains all valid IVs for the ground-truth relation, e.g., X Y or Y X . This is because we do not restrict the order of X and Y when we test whether ( { Z , W } , { X , Y , W } ) satisfies the GIN condition in Theorem 2. To show this fact, for the three scenarios in Section 7, we reverse the order of X and Y to make it be Y X and run our method in these graphs. For simplicity, we set sample size N = 5 k . The results are shown in Table 4. From this table, we can see that two metrics are almost close to the original graph having the causal influence X Y in Table 1, indicating that our method does not rule out the valid IVs relative to the ground-truth one relationship. It is noteworthy that if one needs to calculate the causal effect between X and Y, the causal order of X and Y must be given in advance. This is because the IV estimator is based on the order of X and Y (see Equation (1)).

10. Conclusions and Further Work

In this paper, we investigated the problem of testability of instrumental variables in linear non-Gaussian acyclic causal models. In particular, we proposed a necessary condition for detecting valid IVs relative to a target causal influence X Y , which is called the IV-GIN condition. We then gave the graphical implications of the IV-GIN condition in linear non-Gaussian acyclic causal models. We showed how the conditions of instrument criteria can be checked by exploiting the IV-GIN conditions. Moreover, we proposed a sequential method, which selected the set of candidate IVs for the target causal influence X Y from the observational data without precise prior knowledge of causal structure.
The key difference from the existing research considering the testability of IV in a linear non-Gaussian acyclic causal model, such as IV-TETRAD [21,53], is that: (1) we studied the testability of both conditions 1 and 2 while IV-TETRAD only studies the testability of condition 2 (condition 1 as the prior knowledge), and that (2) we investigated the case where a single IV is present in the ground-truth graph while IV-TETRAD needs at least two IVs present. It is worth noting that one can verify the validity of condition 2a using the IV-GIN method in cases where at least two instruments are present in the ground-truth graph. However, the IV-TETRAD condition is too restrictive and rules out some valid IVs. Table 5 summarizes the testability results using the IV-GIN conditions and IV-TETRAD conditions.
There is another way of estimating the causal effect X on Y in a linear non-Gaussian acyclic causal model. For instance, Refs. [37,40] show that the causal effect between any two observed variables is partially identifiable (output the equivalence class of causal effects) by using overcomplete independent component analysis (O-ICA) [54]. One may naturally have the following question: is it necessary to select the IV for estimating the causal effect X on Y? In fact, as stated in [21], for O-ICA based methods, the size of the equivalence class of the identified causal effects could be very large, and the number of unmeasured confounders between X and Y is not clear. Therefore, it is necessary to select the valid IV relative to a target causal influence X Y when there exist latent confounders between X and Y without prior knowledge of the number of latent confounders.
One direction of future work is to extend the IV-GIN condition to the case of a nonlinear additive noise model, and existing techniques [55,56,57] may help to address this issue.

Author Contributions

Conceptualization, F.X., Y.H., Z.G. and K.Z.; methodology, F.X., Y.H., Z.G. and K.Z.; experiments, Z.C. and F.X.; validation, F.X., Y.H., Z.G., Z.C. and K.Z.; formal analysis, F.X., Y.H., Z.G. and K.Z.; investigation, F.X., Y.H., Z.G. and K.Z.; writing—original draft preparation, F.X., Y.H., Z.G. and K.Z.; writing—review and editing, F.X., R.H. and K.Z.; visualization, F.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the China Postdoctoral Science Foundation (020M680225, BX20200011), the National Natural Science Foundation of China (NSFC 11771028, 12071015, 11971040), and Huawei Technologies. K.Z. would like to acknowledge the support by the National Institutes of Health (NIH) under Contract R01HL159805, by the NSF-Convergence Accelerator Track-D award #2134901, and by the United States Air Force under Contract No. FA8650-17-C7715.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The simulated data can be regenerated using the codes, which can be provided to the interested user via an email request to the correspondence author. The Vitamin D Data used in the experiments come from the ivtools package of CRAN, which can be downloaded from https://mirrors.sjtug.sjtu.edu.cn/cran/web/packages/ivtools/index.html (accessed on 20 January 2022).

Acknowledgments

The authors are grateful to the editors and anonymous reviewers for their insightful comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs

Before we present the proofs of our results, we need an important theorem, which gives mathematical characterizations of the GIN condition [24]. For simplicity, the notation P Q denotes that P is independent of Q , and the notation P Q denotes that P is not independent of Q .
Theorem A1.
Suppose that random vectors S , P , and Q are related in the following way:
P = A S + E P ,
Q = B S + E Q .
Denote by l the dimensionality of S . Assume A is of full column rank. Then, if (1) Dim ( P ) > l , (2) E P S , (3) E P E Q , and (4) the cross-covariance matrix of S and Q , Σ L Z = E [ S Q ] has rank l, then E P | | Q Q , i.e., ( Q , P ) satisfies the GIN condition.
Proof. 
The proof was given by Xie et al. [24]. □

Appendix A.1. Proof of Theorem 3

Proof. 
The “if” part: First, suppose that there exists a node C V , C W , such that for every trek π between a node V p { X , Y , W } and a node V q { Z , W } , (a) π goes through at least one node in { C , W } , denoted by V k , and (b) V k has its arrow pointing to V p in π . Because of subconditions (a) and (b), and according to the linear acyclic model, each V p { X , Y , W } is a linear function of Pa ( V p ) plus independent noise. We know that V k can be written as a linear function of { C , W } and independent error ε V p , where ε V p is independent from { C , W } , that is,
V p = A p C W + ε V p
We write { X , Y , W } in a matrix form
X Y W = A C W + E P ,
where A is an appropriate linear transformation, E P is independent of { C , W } , but its components are not necessarily independent of each other. Note that, in Equation (A4), { C , W } and E P are linear combinations of disjoint sets of the noise terms, implied by the directed acyclic structure over all variables.
We now write { Z , W } as linear combinations of the noise terms. Because of subcondition (a), i.e., every trek π between a node V q { Z , W } and a node V p { X , Y , W } goes through at least one node in { C , W } , and according to the definition of trek, i.e., every trek does not contain any colliders, we have { C , W } d-separates { X , Y , W } from { Z , W } . If any noise term ε i is present in E P , it is not among the noise terms in the expression of { Z , W } . Otherwise, if V j also involves ε i , then the direct effect of ε i , among all variables V , is a common cause of Z j and some component of { X , Y , W } . This implies that this path between Z j and that component of { X , Y , W } cannot be d-separated by { C , W } because no component of { C , W } is on the path, as implied by the fact that when { C , W } is written as a linear combination of the underlying noise terms, ε i is not among them. Consequently, any noise term in E P does not contribute to { C , W } or { Z , W } . Hence, { Z , W } can be expressed as
Z W = B C W + E Q ,
where E Q , which is determined by { C , W } and { Z , W } , is independent of E P .
Moreover, because of condition (2), i.e., there is at least one directed path between any one node in { C , W } and any one node in { X , Y } , we know that the cross-covariance matrix of { C , W } and { Z , W } , Σ { C , W } { Z , W } = E [ { C , W } { Z , W } ] has rank k, and that A is of full column rank. Based on the above analysis, we immediately know that the four conditions in Theorem A1 are satisfied. This implies that ( { Z , W } , { X , Y , W } ) satisfies the GIN condition, i.e., [ Z | | W ] follows the IV-GIN condition relative to X Y .
Now, consider any one subset W ˜ in W . Because of condition 3, i.e., there is no proper subset W ˜ of W to satisfy condition 2 and 3, we know ( { Z , W ˜ } , { X , Y , W ˜ } ) violates the GIN condition for any subset W ˜ of W . Therefore, we have that there is no proper subset W ˜ of W such that [ Z | | W ˜ ] follows the IV-GIN condition relative to X Y .
The “only-if” part: We suppose [ Z | | W ] follows the IV-GIN condition relative to X Y and there is no proper subset W ˜ of W such that [ Z | | W ˜ ] follows the IV-GIN condition relative to X Y . That is to say, ( { Z , W } , { X , Y , W } ) satisfies the GIN condition while there is no proper subset of W such that ( { Z , W ˜ } , { X , Y , W ˜ } ) follows the GIN condition. Consider all nodes C V , C W such that C is causally earlier than { X , Y } , and we show that at least one of them satisfies conditions (1) and (2).
First, if condition (1) is violated, then there is a trek τ between some leaf node in Pa ( { X , Y , W } ) , denoted by Pa ( V z ) ( V z { X , Y , W } ), and some component of { Z , W } , denoted by Z j , and this trek does not go through any common cause of the variables in Pa ( { X , Y , W } ) . Then, they have some common cause that does not cause any other variable in Pa ( { X , Y , W } ) . Consequently, there exists at least one noise term, denoted by ε i , that contributes to both Pa ( V z ) (and hence V z ) and Z j but not any other variables in { X , Y , W } . Because of the non-Gaussianity of the noise terms and the Darmois–Skitovitch theorem, if any linear projection of { X , Y , W } , ω { X , Y , W } is independent of { Z , W } , the linear coefficient for V z must be zero. Hence, { ( Z , W } , { X , Y , W } \ { V z } ) satisfies GIN, which contradicts the assumption in the theorem. Therefore, there must exist some { C , W } such that condition (1) holds.
Next, if condition (2) is violated, i.e., there exist one node in { C , W } and one node in { X , Y } such that there is no trek between { C , W } and { X , Y , W } . This implies that at least one of the following cases holds: (a) the column rank of the covariance matrix of { C , W } and { X , Y , W } is smaller than | { C , W } | and (b) the rank of the covariance matrix of { C , W } and { Z , W } is smaller than | { C , W } | . Then, the condition ω E [ { X , Y , W } { Z , W } ] = 0 does not guarantee that ω A = 0 . Under the faithfulness assumption, we then do not have that ω { X , Y , W } is independent of { Z , W } . Hence, condition (2) also needs to hold.
Because there is no proper subset W ˜ of W such that ( { Z , W ˜ } , { X , Y , W ˜ } ) follows the GIN condition, one can immediately see that condition (3) holds. □

Appendix A.2. Proof of Theorem 2

Proof. 
We prove this result by Theorem 3. To this end, we need to show that the three conditions of Theorem 3 hold.
Because Z is a valid IV conditioning on W relative to X Y , then the instrument criteria hold. Consider the node C in Theorem 3 as X, and we show that for every trek π between a node V p { X , Y , W } and a node V q { X , W } satisfies subconditions (a) and (b). First, because of condition 2 of instrument criteria, i.e., W d-separates Z from Y in the graph obtained by removing the edge X Y from G, we have that π goes through at least one node in { X , W } , denoted by V k . That is to say, subcondition (a) holds. Next, because of condition 1 of instrument criteria, i.e., W contains only nondescendants of Y in G, we have that V k is causally earlier than Y on π . Besides, because of X Y , we further know that V k is causally earlier than V p on π , i.e., subcondition (b) holds.
Moreover, because of condition 3 of instrument criteria, i.e., W does not d-separates Z from X in G, and X Y , we have that there is at least one directed path between any one node in { X , W } and any one node in { X , Y } , i.e., condition (2) holds. □

Appendix A.3. Proof of Proposition 1

Proof. 
Without loss of generality, assume node V r in { Z , W } is descendant of Y in G and there exists a node C V , C W satisfying conditions in Theorem 3. We show that subcondition (b) in Theorem 3 is violated.
Because of conditions 2 3 of instrument criteria, for every trek π between a node V p { X , Y , W } and a node V q { Z , W } goes through at least one node in { C , W } , denoted by V k . Because node V r is descendant of Y and V r { Z , W } , there must exist a trek τ between { X , Y , W } and { Z , W } such that Y has its arrow pointing to V k , which contradicts the subcondition (b) in Theorem 3 ( V k has its arrow pointing to Y). □

Appendix A.4. Proof of Proposition 2

Proof. 
Because there is no node C V such that all active paths between Z and Y go through C and C has its arrow pointing to Y, there must exist a trek τ between between Z and Y such that τ does not go through C, or τ goes through C but Y has its arrow pointing to C in τ . This implies that the condition 1 of Theorem 3, i.e., there exists a node C V , C W , such that for every trek π between a node V p { X , Y , W } and a node V q { Z , W } , (a) π goes through at least one node in { C , W } , denoted by V k , and (b) V k has its arrow pointing to V p in π , is violated. Thus, [ Z | | W ] violates the IV-GIN condition. □

Appendix A.5. Proof of Theorem 4

Proof. 
The validity of a variable as an IV is dependent on which set W we condition on. If a node Z i is a valid IV conditioning on W , it is not necessary to verify whether Z i is a valid IV conditioning on W , where W contains W . Therefore, given an observed variable Z i , one needs to find IV with an empty conditional set and then increase the number of conditional variables until the IV-GIN condition is satisfied or the length of the conditional set equals | O | 1 . The process in the Lines 2 14 of the IV-GIN algorithm is consistent with the above process. Besides, by Theorem 2, one can not remove the valid IVs, which ensures that the output C of the IV-GIN method must contain all valid IVs relative to X Y . □

References

  1. Wright, P.G. Tariff on Animal and Vegetable Oils; Macmillan Company: New York, NY, USA, 1928. [Google Scholar]
  2. Goldberger, A.S. Structural equation methods in the social sciences. Econom. J. Econom. Soc. 1972, 40, 979–1001. [Google Scholar] [CrossRef]
  3. Bowden, R.J.; Turkington, D.A. Instrum. Var.; Number 8; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
  4. Pearl, J. Causality: Models, Reasoning, and Inference, 2nd ed.; Cambridge University Press: New York, NY, USA, 2009. [Google Scholar]
  5. Imbens, G.W. Instrumental Variables: An Econometrician’s Perspective. Stat. Sci. 2014, 29, 323–358. [Google Scholar] [CrossRef]
  6. Imbens, G.W.; Rubin, D.B. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction; Cambridge University Press: Cambridge, UK, 2015. [Google Scholar]
  7. Spirtes, P.; Glymour, C.; Scheines, R. Causation, Prediction, and Search; MIT Press: Cambridge, MA, USA, 2000. [Google Scholar]
  8. Hernán, M.A.; Robins, J.M. Instruments for causal inference: An epidemiologist’s dream? Epidemiology 2006, 17, 360–372. [Google Scholar] [CrossRef] [PubMed]
  9. Baiocchi, M.; Cheng, J.; Small, D.S. Instrumental variable methods for causal inference. Stat. Med. 2014, 33, 2297–2340. [Google Scholar] [CrossRef] [PubMed]
  10. Bound, J.; Jaeger, D.A.; Baker, R.M. Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. J. Am. Stat. Assoc. 1995, 90, 443–450. [Google Scholar] [CrossRef]
  11. Pearl, J. On the testability of causal models with latent and instrumental variables. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1995; pp. 435–443. [Google Scholar]
  12. Manski, C.F. Partial Identification of Probability Distributions; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
  13. Palmer, T.M.; Ramsahai, R.R.; Didelez, V.; Sheehan, N.A. Nonparametric bounds for the causal effect in a binary instrumental-variable model. Stata J. 2011, 11, 345–367. [Google Scholar] [CrossRef]
  14. Kitagawa, T. A test for instrument validity. Econometrica 2015, 83, 2043–2063. [Google Scholar] [CrossRef]
  15. Wang, L.; Robins, J.M.; Richardson, T.S. On falsification of the binary instrumental variable model. Biometrika 2017, 104, 229–236. [Google Scholar] [CrossRef] [PubMed]
  16. Kédagni, D.; Mourifié, I. Generalized instrumental inequalities: Testing the instrumental variable independence assumption. Biometrika 2020, 107, 661–675. [Google Scholar] [CrossRef]
  17. Gunsilius, F.F. Nontestability of instrument validity under continuous treatments. Biometrika 2021, 108, 989–995. [Google Scholar] [CrossRef]
  18. Kuroki, M.; Cai, Z. Instrumental variable tests for Directed Acyclic Graph Models. In Proceedings of the International Workshop on Artificial Intelligence and Statistics, Bridgetown, Barbados, 6–8 January 2005; pp. 190–197. [Google Scholar]
  19. Spearman, C. Pearson’s contribution to the theory of two factors. Br. J. Psychol. 1928, 19, 95–101. [Google Scholar] [CrossRef]
  20. Kang, H.; Zhang, A.; Cai, T.T.; Small, D.S. Instrumental variables estimation with some invalid instruments and its application to Mendelian randomization. J. Am. Stat. Assoc. 2016, 111, 132–144. [Google Scholar] [CrossRef]
  21. Silva, R.; Shimizu, S. Learning instrumental variables with structural and non-gaussianity assumptions. J. Mach. Learn. Res. 2017, 18, 1–49. [Google Scholar]
  22. Sullivant, S.; Talaska, K.; Draisma, J. Trek separation for Gaussian graphical models. Ann. Stat. 2010, 38, 1665–1685. [Google Scholar] [CrossRef]
  23. Spirtes, P. Calculation of Entailed Rank Constraints in Partially Non-linear and Cyclic Models. In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence; AUAI Press: Arlington, VA, USA, 2013; pp. 606–615. [Google Scholar]
  24. Xie, F.; Cai, R.; Huang, B.; Glymour, C.; Hao, Z.; Zhang, K. Generalized Independent Noise Conditionfor Estimating Latent Variable Causal Graphs. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020; pp. 14891–14902. [Google Scholar]
  25. Choi, M.J.; Tan, V.Y.; Anandkumar, A.; Willsky, A.S. Learning latent tree graphical models. J. Mach. Learn. Res. 2011, 12, 1771–1812. [Google Scholar]
  26. Chandrasekaran, V.; Parrilo, P.A.; Willsky, A.S. Latent variable graphical model selection via convex optimization. In Proceedings of the 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 29 September–1 October 2010; pp. 1935–1967. [Google Scholar]
  27. Meng, Z.; Eriksson, B.; Hero, A. Learning latent variable Gaussian graphical models. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 1269–1277. [Google Scholar]
  28. Zorzi, M.; Sepulchre, R. AR identification of latent-variable graphical models. IEEE Trans. Autom. Control. 2015, 61, 2327–2340. [Google Scholar] [CrossRef]
  29. Wu, C.; Zhao, H.; Fang, H.; Deng, M. Graphical model selection with latent variables. Electron. J. Stat. 2017, 11, 3485–3521. [Google Scholar] [CrossRef]
  30. Kumar, S.; Ying, J.; de Miranda Cardoso, J.V.; Palomar, D.P. A Unified Framework for Structured Graph Learning via Spectral Constraints. J. Mach. Learn. Res. 2020, 21, 1–60. [Google Scholar]
  31. Ciccone, V.; Ferrante, A.; Zorzi, M. Learning latent variable dynamic graphical models by confidence sets selection. IEEE Trans. Autom. Control. 2020, 65, 5130–5143. [Google Scholar] [CrossRef]
  32. Alpago, D.; Zorzi, M.; Ferrante, A. A scalable strategy for the identification of latent-variable graphical models. IEEE Trans. Autom. Control. 2021. [Google Scholar] [CrossRef]
  33. Bertsimas, D.; Cory-Wright, R.; Johnson, N.A. Sparse Plus Low Rank Matrix Decomposition: A Discrete Optimization Approach. arXiv 2021, arXiv:2109.12701. [Google Scholar]
  34. Spirtes, P.; Meek, C.; Richardson, T. Causal inference in the presence of latent variables and selection bias. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence; Morgan Kaufmann Publishers Inc.: Burlington, MA, USA, 1995; pp. 499–506. [Google Scholar]
  35. Colombo, D.; Maathuis, M.H.; Kalisch, M.; Richardson, T.S. Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Stat. 2012, 40, 294–321. [Google Scholar] [CrossRef]
  36. Kitson, N.K.; Constantinou, A.C.; Guo, Z.; Liu, Y.; Chobtham, K. A survey of Bayesian Network structure learning. arXiv 2021, arXiv:2109.11415. [Google Scholar]
  37. Hoyer, P.O.; Shimizu, S.; Kerminen, A.J.; Palviainen, M. Estimation of causal effects using linear non-Gaussian causal models with hidden variables. Int. J. Approx. Reason. 2008, 49, 362–378. [Google Scholar] [CrossRef]
  38. Entner, D.; Hoyer, P.O. Discovering unconfounded causal relationships using linear non-gaussian models. In JSAI International Symposium on Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 2010; pp. 181–195. [Google Scholar]
  39. Tashiro, T.; Shimizu, S.; Hyvärinen, A.; Washio, T. ParceLiNGAM: A causal ordering method robust against latent confounders. Neural Comput. 2014, 26, 57–83. [Google Scholar] [CrossRef] [PubMed]
  40. Salehkaleybar, S.; Ghassami, A.; Kiyavash, N.; Zhang, K. Learning Linear Non-Gaussian Causal Models in the Presence of Latent Variables. J. Mach. Learn. Res. 2020, 21, 1–24. [Google Scholar]
  41. Ciccone, V.; Ferrante, A.; Zorzi, M. Robust identification of “sparse plus low-rank” graphical models: An optimization approach. In Proceedings of the 2018 IEEE Conference on Decision and Control (CDC), Miami, FL, USA, 17–19 December 2018; pp. 2241–2246. [Google Scholar]
  42. Alpago, D.; Zorzi, M.; Ferrante, A. Identification of sparse reciprocal graphical models. IEEE Control. Syst. Lett. 2018, 2, 659–664. [Google Scholar] [CrossRef]
  43. Frot, B.; Nandy, P.; Maathuis, M.H. Robust causal structure learning with some hidden variables. J. R. Stat. Soc. Ser. (Stat. Methodol.) 2019, 81, 459–487. [Google Scholar] [CrossRef]
  44. Agrawal, R.; Squires, C.; Prasad, N.; Uhler, C. The DeCAMFounder: Non-Linear Causal Discovery in the Presence of Hidden Variables. arXiv 2021, arXiv:2102.07921. [Google Scholar]
  45. Brito, C.; Pearl, J. Generalized instrumental variables. In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2002; pp. 85–93. [Google Scholar]
  46. Bollen, K.A. Structural Equations with Latent Variable; John Wiley & Sons: Hoboken, NJ, USA, 1989. [Google Scholar]
  47. Shimizu, S.; Hoyer, P.O.; Hyvärinen, A.; Kerminen, A. A linear non-Gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 2006, 7, 2003–2030. [Google Scholar]
  48. Kagan, A.M.; Rao, C.R.; Linnik, Y.V. Characterization Problems in Mathematical Statistics; John Wiley: New York, NY, USA, 1973. [Google Scholar]
  49. Fisher, R.A. Statistical Methods for Research Workers; Springer: Berlin/Heidelberg, Germany, 1950. [Google Scholar]
  50. Zhang, Q.; Filippi, S.; Gretton, A.; Sejdinovic, D. Large-scale kernel methods for independence testing. Stat. Comput. 2018, 28, 113–130. [Google Scholar] [CrossRef]
  51. Skaaby, T.; Husemoen, L.L.N.; Martinussen, T.; Thyssen, J.P.; Melgaard, M.; Thuesen, B.H.; Pisinger, C.; Jørgensen, T.; Johansen, J.D.; Menné, T.; et al. Vitamin D status, filaggrin genotype, and cardiovascular risk factors: A Mendelian randomization approach. PLoS ONE 2013, 8, e57647. [Google Scholar]
  52. Martinussen, T.; Nørbo Sørensen, D.; Vansteelandt, S. Instrumental variables estimation under a structural Cox model. Biostatistics 2019, 20, 65–79. [Google Scholar] [CrossRef] [PubMed]
  53. Silva, R.; Shimizu, S. Learning Instrumental Variables with Non-Gaussianity Assumptions: Theoretical Limitations and Practical Algorithms. arXiv 2015, arXiv:1511.02722. [Google Scholar]
  54. Hyvärinen, A.; Karhunen, J.; Oja, E. Independent Component Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2004; Volume 46. [Google Scholar]
  55. Hoyer, P.O.; Janzing, D.; Mooij, J.M.; Peters, J.; Schölkopf, B. Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2009; pp. 689–696. [Google Scholar]
  56. Zhang, K.; Hyvärinen, A. On the identifiability of the post-nonlinear causal model. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence; AUAI Press: Arlington, VA, USA, 2009; pp. 647–655. [Google Scholar]
  57. Peters, J.; Mooij, J.M.; Janzing, D.; Schölkopf, B. Causal Discovery with Continuous Additive Noise Models. J. Mach. Learn. Res. 2014, 15, 2009–2053. [Google Scholar]
Figure 1. A simple instrumental variable example where X is treatment, Y is outcome, and Z is an IV relative to X Y .
Figure 1. A simple instrumental variable example where X is treatment, Y is outcome, and Z is an IV relative to X Y .
Entropy 24 00512 g001
Figure 2. A typical instrumental variable model where X is treatment, Y is outcome, and Z is an IV conditioning on { W 1 , W 2 } relative to X Y .
Figure 2. A typical instrumental variable model where X is treatment, Y is outcome, and Z is an IV conditioning on { W 1 , W 2 } relative to X Y .
Entropy 24 00512 g002
Figure 3. (a) Z is a valid IV for the relation X Y and (b) Z is an invalid IV for the relation X Y .
Figure 3. (a) Z is a valid IV for the relation X Y and (b) Z is an invalid IV for the relation X Y .
Entropy 24 00512 g003
Figure 4. Illustration on the fact that non-Gaussianity leads to dependence between invalid IV Z and surrogate-variable Y σ Y Z σ X Z X . (a) Scatter plot of valid IV Z and surrogate-variable Y σ Y Z σ X Z X . (b) Scatter plot of invalid IV Z and surrogate-variable Y σ Y Z σ X Z X .
Figure 4. Illustration on the fact that non-Gaussianity leads to dependence between invalid IV Z and surrogate-variable Y σ Y Z σ X Z X . (a) Scatter plot of valid IV Z and surrogate-variable Y σ Y Z σ X Z X . (b) Scatter plot of invalid IV Z and surrogate-variable Y σ Y Z σ X Z X .
Entropy 24 00512 g004
Figure 5. Causal graph where Z is a valid IV conditioning on W 1 relative to X Y but an invalid IV conditioning on W 2 relative to X Y .
Figure 5. Causal graph where Z is a valid IV conditioning on W 1 relative to X Y but an invalid IV conditioning on W 2 relative to X Y .
Entropy 24 00512 g005
Figure 6. Causal graph where Z is an invalid IV conditioning on W 1 relative to X Y due to the nondirected path Z U 2 Y .
Figure 6. Causal graph where Z is an invalid IV conditioning on W 1 relative to X Y due to the nondirected path Z U 2 Y .
Entropy 24 00512 g006
Figure 7. Causal graph where Z is a invalid IV conditioning on an empty set relative to X Y but ( { Z } , { Y , X } ) follows the GIN condition.
Figure 7. Causal graph where Z is a invalid IV conditioning on an empty set relative to X Y but ( { Z } , { Y , X } ) follows the GIN condition.
Entropy 24 00512 g007
Figure 8. Causal graph where Z is an invalid IV conditioning on an empty set relative to X Y due to the directed path Z Y .
Figure 8. Causal graph where Z is an invalid IV conditioning on an empty set relative to X Y due to the directed path Z Y .
Entropy 24 00512 g008
Figure 9. Three different scenarios used in our simulation studies.
Figure 9. Three different scenarios used in our simulation studies.
Entropy 24 00512 g009
Table 1. Performance of IV-GIN, sisVIVE, and IV-TETRAD on selecting valid IVs with different sample sizes.
Table 1. Performance of IV-GIN, sisVIVE, and IV-TETRAD on selecting valid IVs with different sample sizes.
Correct-Selecting Rate ↑Selection Commission ↓
AlgorithmIV-GIN (Ours)sisVIVEIV-TETRADIV-GIN (Ours)sisVIVEIV-TETRAD
Scenario S 1 1k0.920.760.840.120.00.16
3k0.950.810.960.030.00.04
5k0.970.850.960.00.00.04
Scenario S 2 1k0.90.920.030.030.080.0
3k0.950.930.020.00.020.0
5k1.00.940.00.00.00.0
Scenario S 3 1k0.750.290.050.10.590.1
3k0.860.20.020.050.70.05
5k0.930.240.020.020.630.0
Note: ↑ means a higher value is better and ↓ means a lower value is better.
Table 2. Performance of IV-GIN, sisVIVE, and IV-TETRAD on selecting valid IVs with different effect of unmeasured confounders between treatment and outcome.
Table 2. Performance of IV-GIN, sisVIVE, and IV-TETRAD on selecting valid IVs with different effect of unmeasured confounders between treatment and outcome.
Correct-Selecting Rate ↑Selection Commission ↓
AlgorithmIV-GIN (Ours)sisVIVEIV-TETRADIV-GIN (Ours)sisVIVEIV-TETRAD
Scenario S 1 λ = 0.125 0.960.830.920.060.010.08
λ = 0.25 0.850.720.860.010.00.01
Scenario S 2 λ = 0.125 0.980.930.020.040.060.0
λ = 0.25 0.920.910.00.080.10.0
Scenario S 3 λ = 0.125 0.890.220.050.030.580.02
λ = 0.25 0.850.20.030.070.610.0
Note: ↑ means a higher value is better and ↓ means a lower value is better.
Table 3. Performance of LRpSC+GES on selecting valid IVs with 5k sample sizes.
Table 3. Performance of LRpSC+GES on selecting valid IVs with 5k sample sizes.
MetricsScenario S 1 Scenario S 2 Scenario S 3
Correct-selecting rate ↑0.10.10.09
Selection commission ↓0.00.120.3
Table 4. Performance of IV-GIN on selecting valid IVs with 5k sample sizes where the locations of nodes X and Y are swapped.
Table 4. Performance of IV-GIN on selecting valid IVs with 5k sample sizes where the locations of nodes X and Y are swapped.
MetricsScenario S 1 Scenario S 2 Scenario S 3
Correct-selecting rate ↑0.961.00.92
Selection commission ↓0.010.00.04
Table 5. Summary of the testability results using the IV-GIN conditions presented in our paper and IV-TETRAD conditions presented in [21].
Table 5. Summary of the testability results using the IV-GIN conditions presented in our paper and IV-TETRAD conditions presented in [21].
Testability of Instrument Criteria
MethodScenario S 1 Scenario S 1 Scenario S 1
IV-GIN (ours)FullyPartiallyNone
IV-TETRADNoneFullyNone
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Xie, F.; He, Y.; Geng, Z.; Chen, Z.; Hou, R.; Zhang, K. Testability of Instrumental Variables in Linear Non-Gaussian Acyclic Causal Models. Entropy 2022, 24, 512. https://doi.org/10.3390/e24040512

AMA Style

Xie F, He Y, Geng Z, Chen Z, Hou R, Zhang K. Testability of Instrumental Variables in Linear Non-Gaussian Acyclic Causal Models. Entropy. 2022; 24(4):512. https://doi.org/10.3390/e24040512

Chicago/Turabian Style

Xie, Feng, Yangbo He, Zhi Geng, Zhengming Chen, Ru Hou, and Kun Zhang. 2022. "Testability of Instrumental Variables in Linear Non-Gaussian Acyclic Causal Models" Entropy 24, no. 4: 512. https://doi.org/10.3390/e24040512

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop