Identifying Causal Effects Under Functional Dependencies

Yizuo Chen; Adnan Darwiche

doi:10.3390/e26121061

and

Computer Science Department, University of California, Los Angeles, CA 90095, USA

^*

Author to whom correspondence should be addressed.

Entropy2024, 26(12), 1061;https://doi.org/10.3390/e26121061

This article belongs to the Special Issue Causal Graphical Models and Their Applications

Version Notes

Order Reprints

Abstract

We study the identification of causal effects, motivated by two improvements to identifiability that can be attained if one knows that some variables in a causal graph are functionally determined by their parents (without needing to know the specific functions). First, an unidentifiable causal effect may become identifiable when certain variables are functional. Secondly, certain functional variables can be excluded from being observed without affecting the identifiability of a causal effect, which may significantly reduce the number of needed variables in observational data. Our results are largely based on an elimination procedure that removes functional variables from a causal graph while preserving key properties in the resulting causal graph, including the identifiability of causal effects. Our treatment of functional dependencies in this context mandates a formal, systematic, and general treatment of positivity assumptions, which are prevalent in the literature on causal effect identifiability and which interact with functional dependencies, leading to another contribution of the presented work.

Keywords:

causal effects; identifiability; functional dependencies

1. Introduction

A causal effect measures the impact of an intervention on some events of interest and is exemplified by the question, “What is the probability that a patient would recover had they taken a drug?”. This type of question, also known as an interventional query, belongs to the second rung of Pearl’s causal hierarchy [1], so it ultimately requires experimental studies if it is to be estimated from data. However, it is well known that such interventional queries can sometimes be answered based on observational queries (first rung of the causal hierarchy), which can be estimated from observational data. This becomes very significant when experimental studies are either not available, expensive to conduct, or would entail ethical concerns. Hence, a key question in causal inference asks when and how a causal effect can be estimated from available observational data, assuming a causal graph is provided [2].

More precisely, given a set of treatment variables (

X

) and a set of outcome variables (

Y

), the causal effect of

x

on

Y

, denoted as

\Pr (Y | d o (x))

or

\Pr_{x} (Y)

, is the marginal probability on

Y

when an intervention sets the states of variables (

X

) to

x

. The problem of identifying a causal effect studies whether

\Pr_{x} (Y)

can be uniquely determined from a causal graph and a distribution (

\Pr (V)

) over some variables (

V

) in the causal graph [2], where

\Pr (V)

is typically estimated from observational data. The casual effect is guaranteed to be identifiable if

V

corresponds to all variables in the causal graph (with some positivity assumptions), that is, if all variables in the causal graph are observed. When some variables are hidden (unobserved), it is possible that different parameterizations of the causal graph will induce the same distribution (

\Pr (V)

) but different values for the causal effect (

\Pr_{x} (Y)

), which leads to unidentifiablility. In the past few decades, a significant amount of effort has been devoted to studying the identifiability of causal effects (see, e.g., [2,3,4,5,6,7]). Some early works include the back-door criterion [2,8] and the front-door criterion [2,3]. These criteria are sound but incomplete, as they may fail to identify certain causal effects that are, indeed, identifiable. Complete identification methods include the do-calculus method [2], the identification algorithm presented in [9], and the ID algorithm proposed in [10]. These methods require some positivity assumptions (constraints) on the observational distribution (

\Pr (V)

) and can derive an identifying formula that computes the causal effect based on

\Pr (V)

when the causal effect is identifiable. Some recent works take a different approach by first estimating the parameters of a causal graph to obtain a fully specified causal model, which is then used to estimate causal effects through inference [11,12,13,14]. Further works focus on the efficiency of estimating causal effects from finite data, e.g., [15,16,17,18].

One main challenge of these algorithms is that they try to identify causal effects from limited information in the form of a causal graph and data on observed variables (

V

). This becomes a problem when only a small number of variables (

V

) is observed, since

\Pr (V)

alone may not provide enough information for deciding the values of causal effects. Such scenarios happen when the collection of data on some variables is infeasible, e.g., if these variables (such as gender and age) involve confidential information. A recent line of work mitigates this problem by studying the impact of additional information on identifiability beyond causal graphs and observational data. For example, Tikka et al. [19] showed that certain unidentifiable causal effects can become identifiable given information about context-specific independence. Our work in this paper follows the same direction, as we consider the problem of causal effect identification in the presence of a particular type of qualitative knowledge called functional dependencies [20]. We say there is a functional dependency between a variable (X) and its parents (

P

) in the causal graph if the distribution (

\Pr (X | P)

) is deterministic but we do not know the distribution itself (i.e., the specific values of

\Pr (x | p)

). In this case, we also say that variable X is functional. Previous works have shown that functional dependencies can be exploited to improve the efficiency of Bayesian network inference [13,21,22,23,24]. We complement these works by showing that functional dependencies can also be exploited to improve the identifiability of causal effects, especially in the presence of hidden variables. In particular, we show that some unidentifiable causal effects may become identifiable, given such functional dependencies; propose techniques for testing identifiability in this context; and highlight other implications of such dependencies on the practice of identifiability.

Consider the following motivational example where we are interested in how the enforcement of speed limits may affect car accidents. The driving age (A) is functionally determined by country (C), driving age and country are causes of speed (X), and speed and driving age are causes of accidents (Y). The DAG on the right captures the causal relations among these variables, where variable A is circled to indicate it is functional. Furthermore, suppose that variables

C, X, Y

are observed. According to classical cause–effect identification methods (e.g., do-calculus and ID algorithm), the causal effect of X on Y is unidentifiable in this case. However, if we take into account that variable A is a function of C, which restricts the class of distributions under consideration, then the causal effect of X on Y becomes identifiable. This exemplifies the improvements to identifiability pursued in this paper.

Consider a causal graph (G) and a distribution (

\Pr (V)

) over the observed variables (

V

) in G. To check the identifiability of a causal effect, it is standard to first apply the projection operation proposed in [25,26], which constructs another causal graph (

G^{'}

) with

V

as its non-root variables, followed by the application of an identification algorithm to

G^{'}

, like the ID algorithm [10]. We call this two-stage procedure project-ID. One restriction of project-ID is that it is applicable only under some positivity constraints (assumptions), such as strict positivity (

\Pr (V) > 0

), which preclude some events from having a zero probability. Nevertheless, these positivity constraints are not always satisfiable in practice and may contradict functional dependencies. For example, if Y is a function of X, then the positivity constraint (

\Pr (X, Y) > 0

) never holds. To systematically treat this interaction between positivity constraints and functional dependencies, we formulate the notion of constrained identifiability, which takes positivity constraints as an input (in addition to the causal graph (G) and distribution (

\Pr (V)

)). We also formulate the notion of functional identifiability, which further takes functional dependencies as an input. This allows us to explicitly treat the interactions between positivity constraints and functional dependencies, which is needed for the combination of classical methods like project-ID with the results we present in this paper.

The paper is structured as follows. We start with some technical preliminaries in Section 2. We formally define positivity constraints and functional dependencies in Section 3, where we also introduce the problems of constrained and functional identifiability. Section 4 introduces two primitive operations, functional elimination and functional projection, which are needed for later treatments. Section 5 presents our core results on functional identifiability and how they can be combined with existing identifiability algorithms. We conduct experiments to evaluate the effectiveness of functional dependencies on cause–effect identifiability in Section 6. Finally, we close with concluding remarks in Section 7. Proofs of all results are included in Appendix C. This paper is an extended version of [27].

2. Technical Preliminaries

We consider discrete variables in this work. Single variables are denoted by uppercase letters (e.g., X), and their states are denoted by lowercase letters (e.g., x). Sets of variables are denoted by bold uppercase letters (e.g.,

X

), and their instantiations (sets of values) are denoted by bold lowercase letters (e.g.,

x

).

2.1. Causal Bayesian Networks and Interventions

A Causal Bayesian Network (CBN) is a pair

⟨ G, F ⟩

, where G is a causal graph in the form of a directed acyclic graph (DAG) and

F

is a set of conditional probability tables (CPTs). We have one CPT for each variable (X) with parents

P

in G, denoted as

f_{X} (X, P)

, which specifies the conditional probability distributions (

\Pr (X | P)

). It follows that every CPT (

f_{X} (X, P)

) satisfies the following properties:

f_{X} (x, p) \in [0, 1]

for all instantiations (

x, p

) and

\sum_{x} f_{X} (x, p) = 1

for each instantiation (

p

). For simplicity, we also denote

f_{X} (X, P)

as

f_{X}

.

A CBN induces a joint distribution over its variables (

V

), which is exactly the product of its CPTs, i.e.,

\Pr (V) = \prod_{V \in V} f_{V}

. In the CBN shown in Figure 1a, for example,

\Pr (A, B, X_{1}, C, D, E, X_{2}, Y) = f_{A} (A)

f_{B} (B, A)

f_{X_{1}} (X_{1}, B)

f_{C} (B, C)

f_{D} (D, X_{1}, C)

f_{E} (E, D)

f_{X_{2}} (X_{2}, D)

f_{Y} (Y, E, X_{2})

. Applying a treatment (

d o (x)

) to the joint distribution yields a new distribution called the interventional distribution, denoted as

\Pr_{x} (V)

. One way to compute the interventional distribution is to consider the mutilated CBN

⟨ G^{'}, F^{'} ⟩

that is constructed from the original CBN

⟨ G, F ⟩

as follows: Remove from G all edges that point to variables in

X

; then, replace the CPT in

F

for each

X \in X

with a CPT (

f_{X} (X)

where

f_{X} (x) = 1

if x is consistent with

x

and

f_{X} (x) = 0

otherwise). Figure 1a depicts a causal graph (G), and Figure 1b depicts the mutilated causal graph (

G^{'}

) under a treatment (

d o (x_{1}, x_{2})

). The interventional distribution (

\Pr_{x}

) is the distribution induced by the mutilated CBN

⟨ G^{'}, F^{'} ⟩

, where

\Pr_{x} (Y)

corresponds to the causal effect (

\Pr (Y | d o (x))

, also notated by

\Pr (Y_{x})

). In this example, the causal effect of

d o (x_{1}, x_{2})

on Y can be computed by

\Pr_{x_{1}, x_{2}} (y) = \sum_{a, b, c, d, e}

f_{A} (a)

f_{B} (b, a)

f_{C} (b, c)

f_{D} (d, x_{1}, c)

f_{E} (e, d)

f_{Y} (y, e, x_{2})

.

Figure 1. Mutilated and projected graphs of a causal graph. Hidden variables are circled. A bidirected edge (

V_{1} ⤎ ⤏ V_{2}

) is aa compact notation for

V_{1} \leftarrow H \to V_{2}

, where H is an auxiliary hidden variable. (a) Causal graph; (b) mutilated graph; (c) projected graph.

2.2. Identifying Causal Effects

A key question in causal inference is to check whether a causal effect can be (uniquely) computed given the causal graph (G) and a distribution (

\Pr (V)

) over a subset (

V

) of its variables. If the answer is yes, we say that the causal effect is identifiable, given G and

\Pr (V)

. Otherwise, the causal effect is unidentifiable. Variables (

V

) are said to be observed, and the remaining variables are said to be hidden, where

\Pr (V)

is usually estimated from observational data. We start with the general definition of identifiability (not necessarily for causal effects) from [2] (Ch. 3.2.4), with a slight rephrasing.

Definition 1

(Identifiability [2]). Let

Q (M)

be any computable quantity of a model (M). We say that Q is identifiable in a class of models if, for any pair of models (

M_{1}

and

M_{2}

) from this class,

Q (M_{1}) = Q (M_{2})

whenever

\Pr_{M_{1}} (V) = \Pr_{M_{2}} (V)

, where

V

represents the observed variables.

In the context of causal effects, the problem of identifiability is to check whether every pair of fully specified CBNs (

M_{1}

and

M_{2}

in Definition 1) that induces the same distribution (

\Pr (V)

) also produces the same value for the causal effect. Equivalently, to show that a causal effect is unidentifiable, it suffices to find two CBNs that induce the same distribution (

\Pr (V)

) yet different causal effects. Note that Definition 1 does not restrict the considered models (

M_{1}

and

M_{2}

) based on the properties of the distributions (

\Pr_{M_{1}} (V)

and

\Pr_{M_{2}} (V)

). However, in the literature on identifying causal effects, it is quite common to only consider CBNs (models) that induce distributions that satisfy some positivity constraints, such as

\Pr (V) > 0

. We examine such constraints more carefully in Section 3, as they may contradict functional dependencies, which we introduce later.

It is well known that under some positivity constraints (e.g.,

\Pr (V) > 0

), the identifiability of causal effects can be efficiently tested using what we call the project-ID algorithm. Given a causal graph (G), project-ID first applies the projection operation proposed in [25,26,28] to yield a new causal graph (

G^{'}

) whose hidden variables are all roots, each with exactly two children. These properties are needed by the ID algorithm [10], which is then applied to

G^{'}

to yield an identifying formula if the causal effect is identifiable and resulting in an outcome of FAIL otherwise. Consider the causal effect (

\Pr_{x_{1} x_{2}} (y)

) in Figure 1a, where hidden variables are the non-root variables (

B, D

). We first project the causal graph (G) in Figure 1a onto its observed variables to yield the causal graph (

G^{'}

) in Figure 1c (all hidden variables in

G^{'}

are auxiliary and roots). We then run the ID algorithm on

G^{'}

, which returns the following (simplified) identifying formula:

\Pr_{x_{1} x_{2}} (y) =

\sum_{c} \Pr (c)

\sum_{e} \Pr (y | e, x_{2})

\Pr (e | x_{1}, c)

. Hence, the causal effect (

\Pr_{x_{1} x_{2}} (y)

) is identifiable and can be computed using the above formula. Moreover, all quantities in the formula can be obtained from the distribution (

\Pr (A, C, E, X_{1}, X_{2}, Y)

) over observed variables, which can be estimated from observational data. More details on the projection operation and the ID algorithm can be found in Appendix A.

3. Constrained and Functional Identifiability

As mentioned earlier, Definition 1 of identifiability [2] (Ch. 3.2.4) does not restrict the pair of considered models (

M_{1}

and

M_{2}

). However, it is common in the literature on cause–effect identifiability to only consider CBNs with distributions (

\Pr (V)

) that satisfy some positivity constraints. Strict positivity (

\Pr (V) > 0

) is, perhaps, the mostly widely used constraint [2,9,28], that is, in Definition 1, we only consider CBNs

M_{1}

and

M_{2}

, which induce distributions

\Pr_{M_{1}}

and

\Pr_{M_{2}}

that satisfy

\Pr_{M_{1}} (V) > 0

and

\Pr_{M_{2}} (V) > 0

, respectively. Weaker and somewhat intricate positivity constraints were employed by the ID algorithm in [10] as discussed in Appendix A, but we apply this algorithm only under strict positivity to keep things simple (see [29,30] for a recent discussion of positivity constraints).

Positivity constraints are motivated by two considerations: technical convenience and the fact that most causal effects would be unidentifiable without some positivity constraints (more on this later). Given the multiplicity of positivity constraints considered in the literature and the subtle interaction between positivity constraints and functional dependencies (which are the main focus of this work), we next provide a systematic treatment of identifiability under positivity constraints.

3.1. Positivity Constraints

We first formalize the notion of a positivity constraint, then define the notion of constrained identifiability, which takes a set of positivity constraints as input (in addition to the causal graph (G) and distribution (

\Pr (V)

)).

Definition 2.

A positivity constraint on

\Pr (V)

is an inequality of the form

\Pr (S | Z) > 0

, where

S \subseteq V,

Z \subseteq V

and

S \cap Z = \emptyset

, that is, for all instantiations (

s, z

), if

\Pr (z) > 0

, then

\Pr (s, z) > 0

.

When

Z = \emptyset

, the positivity constraint is defined on a marginal distribution (

\Pr (S) > 0

). To illustrate, the positivity constraint,

\Pr (X_{2} | X_{1}, C) > 0

in Figure 1a specifies the constraint whereby

\Pr (x_{2}, x_{1}, c) > 0

if

\Pr (x_{1}, c) > 0

for every instantiation (

x_{2}, x_{1}, c

). We may impose multiple positivity constraints on a set of variables (

V

). We use

C_{V}

to denote the set of positivity constraints imposed on

\Pr (V)

and

vars (C_{V})

to denote all the variables mentioned by

C_{V}

. Consider the constraints expressed as

C_{V} = {\Pr (X_{1}, X_{2} | B, C) > 0, \Pr (Y | C, D) > 0}

; then,

vars (C_{V}) = {X_{1}, X_{2}, B, C, D, Y}

. The weakest set of positivity constraints is

C_{V} = {}

(no positivity constraints, as in Definition 1), and the strongest positivity constraint is

C_{V} = {\Pr (V) > 0}

(strict positivity).

We next provide a definition of identifiability for the causal effect of treatments (

X

) on outcomes (

Y

) in which positivity constraints are an input to the identifiability problem. We call it constrained identifiability, in contrast to the (unconstrained) identifiability of Definition 1.

Definition 3.

We call

⟨ G, V, C_{V} ⟩

an identifiability tuple, where G is a causal graph (DAG),

V

is its set of observed variables, and

C_{V}

is a set of positivity constraints.

Definition 4

(Constrained Identifiability). Let

⟨ G, V, C_{V} ⟩

be an identifiability tuple. The causal effect of

X

on

Y

is said to be identifiable with respect to

⟨ G, V, C_{V} ⟩

if

\Pr_{x}^{1} (y) = \Pr_{x}^{2} (y)

for any pair of distributions (

\Pr^{1}

and

\Pr^{2}

) that are induced by G and that satisfy

\Pr^{1} (V) = \Pr^{2} (V)

, as well as the positivity constraints (

C_{V}

).

For simplicity, we say “identifiability” to mean “constrained identifiability” in the rest of this paper. We next show that without some positivity constraints, most causal effects would not be identifiable. We first define a notion called first ancestor on a causal graph as follows. We say that a treatment (

X \in X

) is a first ancestor of some outcome (

Y \in Y

) if X is an ancestor of Y in causal graph G and that there exists a directed path from X to Y that is not intercepted by

X ∖ {X}

. Consider the causal graph in Figure 2a with hidden variable U; treatment

X_{2}

is a first ancestor of outcome

Y_{2}

, and outcome

Y_{1}

does not have any first ancestor. A first ancestor must exist if some treatment variable is an ancestor of some outcome variable. The following result states a criterion under which a causal effect is never identifiable.

Figure 2. Examples for positivity.

Proposition 1.

The casual effect of

X

on

Y

is not identifiable with respect to an identifiability tuple

⟨ G, V, C_{V} ⟩

if some

X \in X

is a first ancestor of some

Y \in Y

and

C_{V}

does not imply

\Pr (X) > 0

.

Hence, identifiability is not possible without some positivity constraints if at least one treatment variable is an ancestor of some outcome variable (which is common). According to Proposition 1, the causal effect of

{X_{1}, X_{2}}

on

{Y_{1}, Y_{2}}

is not identifiable in Figure 2a if the considered distributions do not satisfy

\Pr (X_{2}) > 0

, as

X_{2}

is a first ancestor of

Y_{2}

.

As positivity constraints become stronger, more causal effects become more likely to be identifiable, since the set of considered models becomes smaller, that is, an unidentifiable causal effect under positivity constraint

C_{V}^{1}

may become identifiable under positivity constraint

C_{V}^{2}

if

C_{V}^{2}

implies

C_{V}^{1}

. Consider the causal graph in Figure 2b, in which all variables are observed (

V = {X, Y, Z}

). Without positivity constraints (

C_{V} = \emptyset

), the causal effect of X on Y is not identifiable. However, it becomes identifiable given strict positivity (

C_{V} = {\Pr (X, Y, Z) > 0}

), leading to an identifying formula expressed as

\Pr_{x} (y) = \sum_{z} \Pr (y | x, z) \Pr (z)

. This causal effect is also identifiable under the weaker positivity constraint, i.e.,

C_{V} = {\Pr (X | Z) > 0} .

In this example, the positivity assumption (

C_{V} = {\Pr (X, Y, Z) > 0}

) is sufficient to make the identifying formula well defined because

\Pr (y | x, z) \Pr (z)

in the formula is equal to zero when

\Pr (z) = 0

and is computable when

\Pr (z) > 0

(the conditional probability

\Pr (y | x, z)

is well defined if

\Pr (x | z) > 0

). This is an example where strict positivity may be assumed for technical convenience only, as it may facilitate the application of some identifiability techniques like do-calculus [2].

3.2. Functional Dependencies

A variable (X) in a causal graph is said to functionally depend on its parents (

P

) if its distribution is deterministic (

\Pr (x | p) \in {0, 1}

) for every instantiation (

x, p

). Variable X is also said to be functional in this case. In this work, we assume qualitative functional dependencies. We do not know the distribution (

\Pr (X | P)

); we only know that it is deterministic. We assume that root variables cannot be functional, as such variables can be removed from the causal graph.

The table on the right shows two variables (B and C) that both have A as their parent. Variable C is functional, but variable B is not. The CPT for variable C is called a functional CPT in this case. Functional CPTs are also known as (causal) mechanisms and are expressed using structural equations in structural causal models (SCMs) [31,32,33]. By definition, in an SCM, every non-root variable is assumed to be functional (when noise variables are represented explicitly in the causal graph).

A	B	C	$\Pr (B \| A)$	$\Pr (C \| A)$
0	0	0	0.2	0
0	1	1	0.8	1
1	0	0	0.6	1
1	1	1	0.4	0

Qualitative functional dependencies are a longstanding concept. For example, they are common in relational databases (see, e.g., [34,35]), and their relevance to probabilistic reasoning was previously brought up in [20] (Ch. 3). One example of a (qualitative) functional dependency is that different countries have different driving ages, so we know that “driving age” functionally depends on “country”, even though we may not know the specific driving age for each country. Another example is that a “Letter grade” for a class is functionally dependent on the student’s “weighted average”, even though we may not know the scheme for converting a weighted average to a letter grade.

In this work, we assume that we are given a causal graph (G) in which some variables (

W

) have been designated as functional. The presence of functional variables further restricts the set of distributions (Pr) that we consider when checking identifiability. This leads to a more refined problem that we call functional identifiability (F-identifiability), which depends on four elements.

Definition 5.

We call

⟨ G, V, C_{V}, W ⟩

an F-identifiability tuple when G is a DAG,

V

is its set of observed variables,

C_{V}

is a set of positivity constraints, and

W

is a set of functional variables in G.

Definition 6

(F-Identifiability). Let

⟨ G, V, C_{V}, W ⟩

be an F-identifiability tuple. The causal effect of

X

on

Y

is F-identifiable with respect to

⟨ G, V, C_{V}, W ⟩

if

\Pr_{x}^{1} (y) = \Pr_{x}^{2} (y)

for any pair of distributions (

\Pr^{1}

and

\Pr^{2}

) that are induced by G, that satisfy

\Pr^{1} (V) = \Pr^{2} (V)

and the positivity constraints (

C_{V}

), and in which variables (

W

) functionally depend on their parents.

Both

C_{V}

and

W

represent constraints on the models (CBNs) we consider when checking identifiability, and these two types of constraints may contradict each other. We next define two notions that characterize some important interactions between positivity constraints and functional variables.

Definition 7.

Let

⟨ G, V, C_{V}, W ⟩

be an F-identifiability tuple. Then,

C_{V}

and

W

are consistent if there exists a parameterization for G that induces a distribution satisfying

C_{V}

and in which variables (

W

) functionally depend on their parents. Moreover,

C_{V}

and

W

are separable if

W \cap vars (C_{V}) = \emptyset

.

If

C_{V}

is inconsistent with

W

, then the set of distributions (Pr) considered in Definition 6 is empty; hence, the causal effect is not well defined (and trivially identifiable according to Definition 6). As such, one would usually want to ensure such consistency. Here are some examples of positivity constraints that are always consistent with a set of functional variables (

W

): positivity foreach treatment variable, i.e.,

{\Pr (X) > 0, X \in X}

; positivity for the set of non-functional treatments, i.e.,

{\Pr (X ∖ W) > 0}

; and positivity for all non-functional variables, i.e.,

{\Pr (V ∖ W) > 0}

. It turns out that all these examples are special cases of the following condition. For a functional variable (

W \in W

), let

H_{W}

be variables that intercept all directed paths from non-functional variables to W (such a

H_{W}

may not be unique). If none of the positivity constraints in

C_{V}

mentions both W and

H_{W}

, then

C_{V}

and

W

are guaranteed to be consistent (see Proposition A4 in Appendix C).

Separability is a stronger condition, and it intuitively implies that the positivity constraints do not rule out any possible functions for the variables in

W

. We need such a condition for one of the results we present later. Some examples of positivity constraints that are separable from

W

are

{\Pr (X ∖ W) > 0}

and

{\Pr (V ∖ W) > 0} .

Studying the interactions between positivity constraints and functional variables, as we do in this section, will prove helpful later when utilizing existing identifiability algorithms (which require positivity constraints) for the testing of functional identifiability.

4. Functional Elimination and Projection

Our approach for testing identifiability under functional dependencies is based the elimination of functional variables from the causal graph, followed by the invocation of the project-ID algorithm on the resulting graph. This can be subtle, though, since the described process does not work for every functional variable, as we discuss in the next section. Moreover, one needs to handle the interaction between positivity constraints and functional variables carefully. However, the first step is to formalize the process of eliminating a functional variable and to study the associated guarantees.

Eliminating variables from a probabilistic model is a well studied operation also known as marginalization (see, e.g., [36,37,38]). When eliminating variable X from a model that represents distribution

\Pr (Z)

, the goal is to obtain a model that represents the marginal distribution (

\Pr (Y) = \sum_{x} \Pr (x, Y)

, where

Y = Z ∖ {X}

). Elimination can also be applied to a DAG (G) that represents conditional independencies (

I

), leading to a new DAG (

G^{'}

) that represents independencies (

I^{'}

) that are implied by

I

. In fact, the projection operation we discussed earlier [25,26] can be understood in these terms. We next propose an operation that eliminates functional variables from a DAG and that comes with stronger guarantees compared to earlier elimination operations as far as preserving independencies.

Definition 8.

The functional elimination of a variable (X) from a DAG (G) yields a new DAG attained by adding an edge from each parent of X to each child of X, then removing X from G.

Appendix B extends this definition to causal Bayesian networks (i.e., updating both CPTs and the causal graph). For convenience, we sometimes say “elimination” to mean “functional elimination” when the context is clear. From the viewpoint of independence relations, functional elimination is not sound if the eliminated variable is not functional. In particular, the DAG (

G^{'}

) that results from this elimination process may satisfy independencies (identified by d-separation) that do not hold in the original DAG (G). As we show later, however, every independence implied by

G^{'}

must be implied by G if the eliminated variable is functional. In the context of SCMs, functional elimination may be interpreted as replacing the eliminated variable (X) with its function in all structural equations that contain X. Functional elimination applies in broader contexts than SCMs, though. Eliminating multiple functional variables in any order yields the same DAG (see Proposition A3 in Appendix B). For example, eliminating variables

{C, D}

from the DAG in Figure 3a yields the DAG in Figure 3c whether we use the order of

π_{1} = C, D

or the order of

π_{2} = D, C

.

Figure 3. Contrasting projection with functional projection. C and D are functional. Hidden variables are circled. (a) DAG; (b) proj. (a) on A, B, G, H, I; (c) eliminate

C, D

from (a); (d) proj. (c) on A, B, G, H, I.

Functional elimination preserves independencies that hold in the original DAG and that are not preserved by other elimination methods, including projection, as defined in [25,26]. These independencies are captured using the notion of D-separation [39,40], which is more refined than the classical notion of d-separation [41,42] (uppercase D versus lowercase d). The original definition of D-separation can be found in [40]. We provide a simpler definition next, stated as Proposition 2, as the equivalence between the two definitions is not immediate.

Proposition 2.

Let

X, Y, Z

be disjoint variable sets and

W

be a set of functional variables in DAG G. Then,

X

and

Y

are D-separated by

Z

in

⟨ G, W ⟩

iff

X

and

Y

are d-separated by

Z^{'}

in G, where

Z^{'}

is obtained as follows. Initially,

Z^{'} = Z

. The next step is repeated until

Z^{'}

stops changing. Then, every variable in

W

whose parents are in

Z^{'}

is added to

Z^{'}

.

To illustrate the difference between d-separation and D-separation, consider, again, the DAG in Figure 3a and assume that variables C and D are functional. Variables G and I are not d-separated by A, but they are D-separated by A, that is, there are distributions that are induced by the DAG in Figure 3a and in which G and I are not independent given A. However, G and I are independent given A in every induced distribution in which variables C and D are functionally determined by their parents. Functional elimination preserves D-separation in the following sense.

Theorem 1.

Consider a DAG (G) with functional variables (

W

). Let

G^{'}

be the result of functionally eliminating variables

W^{'} \subseteq W

from G. For any disjoint sets (

X

,

Y

, and

Z

) in

G^{'}

,

X

and

Y

are D-separated by

Z

in

⟨ G, W ⟩

iff

X

and

Y

are D-separated by

Z

in

⟨ G^{'}, W ∖ W^{'} ⟩

.

The above result is stated with respect to eliminating a subset of the functional variables. If we eliminate all functional variables, then D-separation is reduced to d-separation. For example, variables G and I are D-separated by A in Figure 3c and in Figure 3a as suggested by Theorem 1. In fact, G and I are also d-separated by A in Figure 3c, since we eliminated all functional variables. We now have the following stronger result.

Corollary 1.

Consider a DAG (G) with functional variables (

W

). Let

G^{'}

be the result of functionally eliminating all variables (

W

) from G. For any disjoint sets (

X

,

Y

, and

Z

) in

G^{'}

,

X

and

Y

are d-separated by

Z

in

G^{'}

iff

X

and

Y

are D-separated by

Z

in

⟨ G, W ⟩

.

We now define the operation of functional projection, which augments the original projection operation proposed in [25,26] in the presence of functional dependencies.

Definition 9.

Let G be a DAG,

V

be its observed variables, and

W

be its hidden functional variables (

W \cap V = \emptyset

). The functional projection of G on

V

is a DAG obtained by functionally eliminating variables (

W

) from G, then projecting the resulting DAG on variables (

V

).

We now contrast functional projection and classical projection using the causal graph in Figure 3a, assuming that the observed variables are

V = {A, B, G, H, I}

and the functional variables are

W = {C, D}

. Applying classical projection to this causal graph yields the causal graph in Figure 3b. To apply functional projection, we first functionally eliminate C and D from Figure 3a, which yields Figure 3c; then, we project Figure 3c on variables (

V

), which yields the causal graph in Figure 3d. So we now need to contrast Figure 3b (classical projection) with Figure 3d (functional projection). The latter is a strict subset of the former, as it is missing two bidirected edges. One implication of this is that variables G and I are not d-separated by A in Figure 3b because they are not d-separated in Figure 3a. However, they are D-separated in Figure 3a; hence, they are d-separated in Figure 3d. So functional projection yields a DAG that exhibits more independencies. Again, this is because G and I are D-separated by A in the original DAG, a fact that is not visible to the projection but is visible to (and exploitable by) the functional projection.

An important corollary of functional projection is the following.

Corollary 2.

Let G be a DAG;

V

be its observed variables;

W

be its functional variables, which are all hidden; and

G^{'}

be the result of functionally projecting G on

V

. For any disjoint sets (

X

,

Y

, and

Z

) in

G^{'}

,

X

and

Y

are d-separated by

Z

in

G^{'}

iff

X

and

Y

are D-separated by

Z

in

⟨ G, W ⟩

.

In other words, classical projection preserves d-separation, but functional projection preserves D-separation, which subsumes d-separation. Corollary 2 is a bit more subtle and powerful than it may first seem. First, it concerns D-separations based on hidden functional variables, not all functional variables. Secondly, it shows that such D-separations in G appear as classical d-separations in

G^{'}

which allows us to feed

G^{'}

into existing identifiability algorithms, as we show later. This is a key enabler of some results we present next on the testing of functional identifiability.

5. Causal Identification with Functional Dependencies

Consider the causal graph (G) in Figure 4a and let

V = {A, X, Y}

be its observed variables. According to Definition 4 of identifiability, the causal effect of X on Y is not identifiable with respect to

⟨ G, V, C_{V} ⟩

, where

C_{V} = {\Pr (A, X, Y) > 0}

. We can show this by projecting the causal graph (G) on the observed variables (

V

), which yields the causal graph (

G^{'}

) in Figure 4b, then applying the ID algorithm to

G^{'}

, which returns FAIL. Suppose now that the hidden variable (B) is known to be functional. According to Definition 6 of F-identifiability, this additional knowledge reduces the number of considered models, so it actually renders the causal effect identifiable—the identifying formula is

\Pr_{x} (y) = \sum_{a}

\Pr (a) \Pr (y | a, x)

, as we show later. Hence, an unidentifiable causal effect becomes identifiable in light of knowledge that some variable is functional, even without knowing the structural equations for this variable.

Figure 4. B is functional. (a) DAG; (b) projection.

The question now is how to algorithmically test F-identifiability. We propose two techniques for this purpose, the first of which is geared towards exploiting existing algorithms for classical identifiability. This technique is based on the elimination of functional variables from the causal graph while preserving F-identifiability, with the goal of getting to a point where F-identifiability becomes equivalent to classical identifiability. If we reach this point, we can use existing algorithms for classical identifiability, like the ID algorithm, to test F-identifiability. This can be subtle, though, since hidden functional variables behave differently from observed ones. We start with the following result.

Theorem 2.

Let

⟨ G, V, C_{V}, W ⟩

be an F-identifiability tuple. If

G^{'}

is the result of functionally eliminating the hidden functional variables

(W ∖ V)

from G, then the causal effect of

X

on

Y

is F-identifiable with respect to

⟨ G, V, C_{V}, W ⟩

iff it is F-identifiable with respect to

⟨ G^{'}, V, C_{V}, V \cap W ⟩

.

An immediate corollary of this theorem is that if all functional variables are hidden, then we can reduce the question of F-identifiability to identifiability, since

V \cap W = \emptyset

, so F-identifiability with respect to

⟨ G^{'}, V, C_{V}, V \cap W = \emptyset ⟩

collapses into identifiability with respect to

⟨ G^{'}, V, C_{V} ⟩

.

Corollary 3.

Let

⟨ G, V, C_{V}, W ⟩

be an F-identifiability tuple, where

C_{V} = {\Pr (V) > 0}

and

W

are all hidden. If

G^{'}

is the result of functionally projecting G on variables (

V

), then the causal effect of

X

on

Y

is F-identifiable with respect to

⟨ G, V, C_{V}, W ⟩

iff it is identifiable with respect to

⟨ G, V, C_{V} ⟩

(We require the positivity constraint

\Pr (V) > 0

, as we suspect the projection operation in [26] requires it even though that was not made explicit in the paper; if not, then

C_{V}

can be empty in Corollary 3).

This corollary suggests a method for using the ID algorithm, which is popular for testing identifiability, to establish F-identifiability by coupling ID with functional projection instead of classical projection. Consider the causal graph (G) in Figure 5a with observed variables of

V = {A, B, C, F, X, Y}

. The causal effect of X on Y is not identifiable under

\Pr (V) > 0

; projecting G on observed variables (

V

) yields the causal graph (

G^{'}

) in Figure 5b, and the ID algorithm produces FAIL on

G^{'}

. Suppose now that the hidden variables (

{D, E}

) are functional. To test whether the causal effect is F-identifiable using Corollary 3, we functionally project G on the observed variables (

V

), which yields the causal graph (

G^{″}

) in Figure 5c. Applying the ID algorithm to

G^{″}

produces the following identifying formula:

\Pr_{x} (y) = \sum_{b f} \Pr (f | b, x) \sum_{a c x^{'}} \Pr (y | a, b, c, f, x^{'}) \Pr (a, b, c, x^{'})

; therefore,

\Pr_{x} (y)

is F-identifiable.

Figure 5. Variables

A, B, C, F, X

, and Y are observed. Variables D and E are functional (and hidden). (a) Causal graph; (b) proj. of (a); (c) F-proj. of (a); (d) F-elim. F; (e) F-elim. B.

We stress, again, that Corollary 3 and the corresponding F-identifiability algorithm apply only when all functional variables are hidden. We now treat the case when some of the functional variables are observed. The subtlety here is that, unlike hidden functional variables, eliminating an observed functional variable does not always preserve F-identifiability. However, the following result identifies conditions that guarantee the preservation of F-identifiability based on the notion of separability in Definition 7. If all observed functional variables satisfy these conditions, we can, again, reduce F-identifiability into identifiability, so we can exploit existing methods for identifiability like the ID algorithm and do-calculus.

Theorem 3.

Let

⟨ G, V, C_{V}, W ⟩

be an F-identifiability tuple. Let

Z

be a set of observed functional variables that are neither treatments nor outcomes, are separable from

C_{V}

, and have observed parents. If

G^{'}

is the result of functionally eliminating variables (

Z

) from G, then the causal effect of

X

on

Y

is F-identifiable with respect to

⟨ G, V, C_{V}, W ⟩

iff it is F-identifiable with respect to

⟨ G^{'}, V ∖ Z, C_{V}, W ∖ Z ⟩

.

Intuitively, the theorem allows us to remove observed functional variables from a causal graph if they satisfy the given conditions. We now have the following important corollary of Theorems 2 and 3, which subsumes Corollary 3.

Corollary 4.

Let

⟨ G, V, C_{V}, W ⟩

be an F-identifiability tuple, where

C_{V} = {\Pr (V ∖ W) > 0}

and every variable in

W \cap V

satisfies the conditions of Theorem 3. If

G^{'}

is the result of functionally projecting G on

V ∖ W

, then the causal effect of

X

on

Y

is F-identifiable with respect to

⟨ G, V, C_{V}, W ⟩

iff it is identifiable with respect to

⟨ G^{'}, V ∖ W, C_{V} ⟩

.

Consider, again, the causal effect of X on Y in graph G of Figure 5a with observed variables of

V = {A, B, C, F, X, Y}

. Suppose now that the observed variable (F) is also functional (in addition to hidden functional variables D and E) and assume

\Pr (A, B, C, X, Y) > 0

. Using Corollary 4, we can functionally project G on A, B, C, X, and Y to yield the causal graph (

G^{'}

) in Figure 5d, which reduces F-identifiability on G to classical identifiability on

G^{'}

. Since strict positivity holds in

G^{'}

, we can apply any existing identifiability algorithm and conclude that the causal effect is not identifiable. For another scenario, suppose that the observed variable (B) (instead of F) is functional and we have

\Pr (A, C, F, X, Y) > 0

. Again, using Corollary 4, we functionally project G onto A, C, F, X, and Y to yield the causal graph (

G^{″}

) in Figure 5e, which reduces F-identifiability on G to classical identifiability on

G^{″}

. If we apply the ID algorithm to

G^{″}

, we obtain the following identifying formula (which we denote as Equation (A1)):

\Pr_{x} (y) = \sum_{a f} \Pr (f | a, x) \sum_{c x^{'}} \Pr (y | a, c, f, x^{'}) \Pr (a, c, x^{'})

. In both scenarios presented above, we were able to test F-identifiability using an existing algorithm for identifiability.

Corollary 4 (and Theorem 3) has yet another key application: it can help us pinpoint observations that are not essential for identifiability. To illustrate, consider the second scenario presented above, where the observed variable (B) is functional in the causal graph (G) of Figure 5a. The fact that Corollary 4 allowed us to eliminate variable B from G implies that observation of this variable is not needed to render the causal effect F-identifiable and, hence, is not needed to compute the causal effect. This can be seen by examining the identifying formula (Equation (A1)), which does not contain variable B. This can be further generalized to the causal graph on the right with functional variables (

B_{1}, \dots, B_{n}

), where we assume

\Pr (A, C, X, D, Y) > 0

. According to Corollary 4, we can functionally project the graph onto

V^{'} = {A, C, X, D, Y}

while preserving F-identifiability. Moreover, applying the ID algorithm (or do-calculus) to G yields an identifying formula for

\Pr_{x} (Y)

over only

V^{'}

, that is, in this example, we only need to observe a constant number (five) of variables to render the causal effect F-identifiable, even though the number of observed variables in the original graph is unbounded. This application of Corollary 4 can be quite significant in practice, especially when some variables are expensive to measure (observe) or when they may raise privacy concerns see (e.g., [43,44]).

Theorems 2 and 3 are more far-reaching than what the above discussion may suggest. In particular, even if we cannot eliminate every (observed) functional variable using these theorems, we may still be able to reduce F-identifiability to identifiability due to the following result.

Theorem 4.

Let

⟨ G, V, C_{V}, W ⟩

be an F-identifiability tuple. If every functional variable has at least one hidden parent, then a causal effect of

X

on

Y

is F-identifiable with respect to

⟨ G, V, C_{V}, W ⟩

iff it is identifiable with respect to

⟨ G, V, C_{V} ⟩

.

That is, if we still have functional variables in the causal graph after applying Theorems 2 and 3 and if each such variable has at least one hidden parent, then F-identifiability is equivalent to identifiability. Consider, again, the causal effect of X on Y in G of Figure 5a with observed variables of

V = {A, B, C, F, X, Y}

. Now, suppose that the observed variables (A, B, C, X, and Y are also functional (in addition to hidden functional variables D and E) and assume

\Pr (A, C, F, X, Y) > 0

. We can reduce F-identifiability to classical identifiability by combining Theorems 3 and 4. In particular, according to Theorem 3, we first reduce F-identifiability on G to F-identifiability on

G^{'}

in Figure 5e by functionally eliminating D, E, and B. Since all the remaining functional variables (A, C, X, and Y) have a hidden parent, we can further reduce F-identifiability to identifiability on

G^{'}

according to Theorem 4, then apply existing algorithms (e.g., ID and do-calculus) to conclude that the causal effect is identifiable.

The method we have presented thus far for the testing of F-identifiability is based on the elimination of functional variables from the causal graph, followed by the application of existing tools for causal effect identification, such as the project-ID algorithm and do-calculus. This F-identifiability method is complete if every observed functional variable either satisfies the conditions of Theorem 3 or has at least one hidden parent that is not functional.

We next present another technique for reducing F-identifiability to identifiability. This method is more general and much more direct than the previous one, but it does not allow us to fully exploit some existing tools, like the ID algorithm, due to the positivity assumptions they make. The new method involves pretending that some of the hidden functional variables are actually observed, inspired by Proposition 2, which reduces D-separation to d-separation using a similar technique.

Theorem 5.

Let

⟨ G, V, C_{V}, W ⟩

be an F-identifiability tuple, where

C_{V} = {\Pr (X) > 0, X \in X}

. A causal effect of

X

on

Y

is F-identifiable with respect to

⟨ G, V, C_{V}, W ⟩

iff it is identifiable with respect to

⟨ G, C_{V}, V^{'} ⟩

, where

V^{'}

is obtained as follows. Initially,

V^{'} = V

. This is repeated until

V^{'}

stops changing. Then, a functional variable from

W

is added to

V^{'}

if its parents are in

V^{'}

.

Consider the causal effect of X on Y in graph G of Figure 5a and suppose the observed variables are

V = {A, B, C, X, Y}

; the functional variables are D, E, and F; and we have

\Pr (X) > 0

. According to Theorem 5, the causal effect of X on Y is F-identifiable iff it is identifiable in G while pretending that variables

V^{'} = {A

, B, C, D, E, F, X,

Y}

are all observed. In this case, the casual effect is not identifiable, but we cannot obtain this answer by applying an identifiability algorithm that requires positivity constraints that are stronger than

\Pr (X) > 0

. If we have stronger positivity constraints that imply

\Pr (X) > 0, X \in X,

then only the if part of Theorem 5 holds, assuming

C_{V}

and

W

are consistent, that is, confirming identifiability with respect to

⟨ G, C_{V}, V^{'} ⟩

confirms F-identifiability with respect to

⟨ G, V, C_{V}, W ⟩

, but if identifiability is not confirmed, then F-identifiability may still hold. This suggests that to fully exploit the power of Theorem 5, one would need a new class of identifiability algorithms that can operate under the weakest possible positivity constraints.

6. Experiments

We next report on a simple experiment to empirically demonstrate how knowledge of functional dependencies can aid the identifiability of causal effects. We randomly generated 50 causal graphs (DAGs) with

N \in {50, 100, 150}

variables using the Erdős–Rényi method [45], where every edge in the causal graphs appears with a probability of

0.1

and every variable has, at most, 6 parents. We then randomly picked

0.8 N

observed variables,

0.2 N

treatment variables,

0.2 N

outcome variables, and

W \in {0.25 N, 0.5 N, 0.75 N}

functional variables from the causal graphs.

For each combination of N and W, Table 1 records the number of causal effects (out of 50) that are (1) unidentifiable (uid); and (2) unidentifiable but F-identifiable (uid-fid). The table also records the average number of observed variables after applying Theorems 2–4 (#obs); these are the observed variables passed to the project-ID algorithm (We assume that strict positivity holds for the remaining observed variables after applying Theorems 2–4).

Table 1. Numbers of causal effects that are unidentifiable (uid) and that are unidentifiable but F-identifiable (uid-fid) and average number of observed variables passed to project-ID (#obs) for causal graphs with various numbers of variables (N) and functional ones (W).

The following patterns are clear. First, more unidentifiable causal effects become F-identifiable when more variables exhibit functional dependencies. This observation demonstrates that knowledge of functional dependencies can greatly improve the identifiability of causal effects. Secondly, the number of observed variables required by the project-ID algorithm becomes smaller when there are more functional variables, implying that we only need to collect data on a smaller set of variables to estimate the (identifiable) causal effects. Again, this is because more observed functional variables can be functionally eliminated from the causal graphs by Theorem 3.

7. Conclusions

We studied the identification of causal effects in the presence of a particular type of knowledge called functional dependencies. This augments earlier works that considered other types of knowledge, such as context-specific independence. Our contributions include the formalization of the notion of functional identifiability; the introduction of an operation for eliminating functional variables from a causal graph that comes with stronger guarantees compared to earlier elimination methods; and the employment (under some conditions) of existing algorithms, such as the ID algorithm, for the testing of functional identifiability and to obtain identifying formulas. We further provided a complete reduction of functional identifiability to classical identifiability under very weak positivity constraints and showed how our results can be used to reduce the number of variables needed in observational data. Last but not least, we proposed a more general definition of identifiability based on a broader class of positivity assumptions, which opens the door to uncover causal identification algorithms under weaker positivity assumptions.

Author Contributions

Conceptualization, Y.C. and A.D.; Formal analysis, Y.C. and A.D.; Funding acquisition, A.D.; Investigation, Y.C. and A.D.; Methodology, Y.C. and A.D.; Project administration, A.D.; Resources, Y.C. and A.D.; Supervision, A.D.; Validation, Y.C. and A.D.; Visualization, Y.C. and A.D.; Writing—original draft, Y.C. and A.D.; Writing—review and editing, Y.C. and A.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by ONR grant N000142212501.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. More on Projection and the ID Algorithm

As mentioned in the main paper, the project-ID algorithm involves two steps: the projection operation and the ID algorithm. We review more technical details of each step in this section.

Appendix A.1. Projection

The projection [25,26,28] of G onto

V

constructs a new DAG (

G^{'}

) over variables (

V

) as follows. Initially, DAG

G^{'}

contains variables (

V

) but no edges. Then, for every pair of variables (

X, Y \in V

), an edge is added from X to Y to

G^{'}

if X is a parent of Y in G or if there exists a directed path from X to Y in G such that none of the internal nodes on the path is in

V

. Furthermore, a bidirected edge (

X ⤎ ⤏ Y

) is added between every pair of variables (X and Y) in

G^{'}

if there exists a divergent path (A divergent path between X and Y is a path in the form of

X \leftarrow \dots \leftarrow U \to \dots \to Y

) between X and Y in G such that none of the internal nodes on the path is in

V

. For example, the projection of the DAG in Figure 1a onto

A, C, E, X_{1}, X_{2}, Y

yields Figure 1c. A bidirected edge (

X ⤎ ⤏ Y

) is a compact notation for

X \leftarrow H \to Y

, where H is an auxiliary hidden variable. Hence, the projected DAG in Figure 1c can be interpreted as a classical DAG but with additional, hidden root variables.

The projection operation is guaranteed to produce a DAG (

G^{'}

) in which hidden variables are all roots and each has exactly two children. Graphs that satisfy this property are called semi-Markovian and can be fed as inputs to the ID algorithm for testing of identifiability [10]. Moreover, projection preserves some properties of G, such as d-separation [25] among

V

, which guarantees that identifiability is preserved when working with

G^{'}

instead of G [26].

Appendix A.2. ID Algorithm

After obtaining a projected causal graph, we can apply the ID algorithm for identification of causal effects [10,46]. The algorithm returns either an identifying formula if the causal effect is identifiable or FAIL otherwise. The algorithm is sound, since each line of the algorithm can be proven with basic probability rules and do-calculus. The algorithm is also complete, since a causal graph must contain a hedge, a graphical structure that induces the unidentifiability, if the algorithm returns FAIL. However, the algorithm is only sound and complete under certain positivity constraints, which are weaker but more subtle than strict positivity (

\Pr (V) > 0

).

The positivity constraints required by ID can be summarized as follows (

X

represents the treatment variables): (1)

\Pr (X | P) > 0

, where

P = Parents (X) ∖ X

, and (2)

\Pr (Z) > 0

for all quantities

\Pr (S | Z)

considered by the ID algorithm. The second constraint depends on a particular run of the ID algorithm and can be interpreted as follows. First, if the ID algorithm returns FAIL, then the causal effect is not identifiable, even under the strict positivity constraint of

\Pr (V) > 0

. However, if the ID algorithm returns “identifiable”, then the causal effect is identifiable under the above constraints, which are now well defined given a particular run of the ID algorithm. We illustrate with an example next.

Consider the causal graph on the right, which contains observed variables

{A, B, C, X, Y}

. Suppose we are interested in the causal effect of X on Y. Applying the ID algorithm returns the following identifying formula:

\Pr_{x} (y)

= \sum_{a b c}

\Pr (c | a, b, x)

\sum_{x^{'}} \Pr (y | a, b, c, x^{'})

\Pr (a, b, x^{'})

. The positivity constraint extracted from this run of the algorithm is

\Pr (a, b, c, x) > 0

for all of a, b, c, and x, that is, we can only safely declare the causal effect identifiable based on the ID algorithm if this positivity constraint is satisfied.

Appendix B. Functional Elimination for CBNs

The functional elimination in Definition 8 removes functional variables from a DAG (G) and yields another DAG (

G^{'}

) for the remaining variables. We showed in the main paper that the functional elimination preserves the D-separations. Here, we extend the notion of functional elimination to causal Bayesian networks (CBNs), which contain not only a causal graph (DAG) but also CPTs. We show that (extended) functional elimination preserves the marginal distribution of the remaining variables, that is, given any CBN with causal graph G, we can construct another CBN with causal graph

G^{'}

such that the two CBNs induce the same distribution of the variables in

G^{'}

, where

G^{'}

is the result of eliminating functional variables from G. Moreover, we show that the functional elimination operation further preserves the causal effects, which makes it applicable to causal identification. This extended version of functional elimination and the corresponding results are used as proofs in Appendix C.

Recall that a CBN

⟨ G, F ⟩

contains a causal graph (G) and a set of CPTs (

F

). We first extend the definition of functional elimination (Definition 8) from DAGs to CBNs.

Definition A1.

The functional elimination of a functional variable (X) from a CBN

⟨ G, F ⟩

yields another CBN

⟨ G^{'}, F^{'} ⟩

obtained as follows. The DAG (

G^{'}

) is obtained from G according to Definition 8. For each child (C) of X, its CPT in

F^{'}

is

(\sum_{X} f_{X} f_{C})

, where

f_{X}

and

f_{C}

are the corresponding CPTs in

F

.

We first show that the new CPTs produced by Definition A1 are well defined.

Proposition A1.

Let

f_{X}

and

f_{Y}

be the CPTs for variables X and Y in a CBN; then,

(\sum_{X} f_{X} f_{Y})

is a valid CPT for Y.

The next proposition shows that functional elimination preserves the functional dependencies.

Proposition A2.

Let

M^{'}

be the CBN resulting from functionally eliminating a functional variable from a CBN (

M

). Then, each variable (from

M^{'}

) is functional in

M^{'}

if it is functional in

M

.

The next theorem shows that the order of functional elimination does not matter.

Proposition A3.

Let

M

be a CBN and

π_{1}

and

π_{2}

be two variable orders over a set of functional variables (

W

). Then, functionally eliminating

W

from

M

according to

π_{1}

and

π_{2}

yields the same CBN.

The next result shows that eliminating functional variables preserves the marginal distribution.

Theorem A1.

Consider a CBN (

M

) that induces Pr. Let

M^{'}

be the result of functionally eliminating a set of functional variables (

W

) from

M

, which induces

\Pr^{'}

. Then,

\Pr^{'} = \sum_{W} \Pr

.

One key property of functional elimination is that it preserves the interventional distribution over the remaining variables. This property allows us to eliminate functional variables from a causal graph and estimate the causal effects in the resulting graph.

Theorem A2.

Let

M^{'}

be the CBN over variables (

V

) resulting from functionally eliminating a set of functional variables (

W

) from a CBN (

M

). Then,

M^{'}

and

M

attain the same

\Pr_{x} (V)

for any

X \subseteq V

.

Appendix C. Proofs

The proofs of the results are ordered slightly differently from the order in which they appear in the main body of the paper.

Proof of Proposition 1.

Our goal is to construct two different parameterizations (

F^{1}

and

F^{2}

) that induce the same

\Pr (V)

but different

\Pr_{x} (y)

. This is done by first creating a parameterization (

F

) that contains strictly positive CPTs for all variables, then constructing

F^{1}

and

F^{2}

based on

F

.

Let

P

be the directed path from

X = X_{i}

to Y, denoted as

X \to Z \to \dots \to Y

, which does not contain any treatment variables other than X. Let

P_{X}

be the parents of X in G. For each node M on the path, let

P_{M}

be the parents of M, except for the parent that lies on

P

. Moreover, for each variable (M) on

P

, we only modify the conditional probability for a single state (

m^{*}

) of M, where

x^{*} \in x

is the treated state of X. Let

ϵ

be an arbitrarily small constant (close to 0). Next, we show the modifications for the CPTs in

F^{1}

.

f^{1} (x | p_{X}) = \{\begin{matrix} 0 & if x = x^{*} \\ 1 / (| X | - 1), & otherwise \end{matrix}

f^{1} (z | x, p_{Z}) = \{\begin{matrix} 1 - ϵ, & if x = x^{*}, z = z^{*} \\ ϵ / (| Z | - 1), & if x = x^{*}, z \neq z^{*} \\ ϵ, & if x \neq x^{*}, z = z^{*} \\ (1 - ϵ) / (| Z | - 1), & if x \neq x^{*}, z \neq z^{*} \end{matrix}

For every variable (

T \notin {X, Z}

) that has parent Q on the path

P

, we assign

f^{1} (t | q, p_{T}) = \{\begin{matrix} 1 - ϵ, & if q = q^{*}, t = t^{*} \\ ϵ / (| T | - 1), & if q = q^{*}, t \neq t^{*} \\ ϵ, & if q \neq q^{*}, t = t^{*} \\ (1 - ϵ) / (| T | - 1), & if q \neq q^{*}, t \neq t^{*} \end{matrix}

We assign the same CPTs for X and all variables (

T \notin {X, Z}

) but a different CPT for Z in

F^{2}

.

f^{2} (z | x, p_{Z}) = \{\begin{matrix} ϵ, & if x = x^{*}, z = z^{*} \\ (1 - ϵ) / (| Z | - 1), & if x = x^{*}, z \neq z^{*} \\ ϵ, & if x \neq x^{*}, z = z^{*} \\ (1 - ϵ) / (| Z | - 1), & if x \neq x^{*}, z \neq z^{*} \end{matrix}

The two parameterizations (

F^{1}

and

F^{2}

) induce the same

\Pr (V)

, where

\Pr (v) = 0

if

x^{*} \in v

and

\Pr (v) > 0

otherwise. Next, we show that the parameterization satisfies each positivity constraint (

\Pr (S | Z)

) as long as it does not imply

\Pr (X) > 0

. We first show that

X \in S

implies

\Pr (X) > 0

. This is because

\Pr (S) = \sum_{z} \Pr (S | z) \Pr (z)

and there must exist some instantiation (

z

) where

\Pr (z) > 0

and

\Pr (S | z) > 0

by constraint. This implies

\Pr (S) > 0

and, therefore,

\Pr (X) > 0

. Hence,

C_{V}

does not contain such a constraint (

\Pr (S | Z)

) where

X \in S

. Suppose

X \in Z

; then,

\Pr (z) > 0

if and only if

x^{*} \notin z

. Moreover, since

\Pr (v) > 0

, whenever

x^{*} \notin v

, it is guaranteed that

\Pr (S, z) > 0

when

\Pr (z) > 0

, which implies

\Pr (S | Z) > 0

. Finally, suppose

X \notin (S \cup Z)

; then,

\Pr (S, Z) = \sum_{x} \Pr (S, Z, x) > 0

. Hence, the positivity constraint is satisfied by both parameterizations. By construction,

\Pr^{1}

and

\Pr^{2}

induce different values for the causal effect (

\Pr_{x} (y)

), since the probability of

Y = y^{*}

under treatment

d o (X = x^{*})

differs under the two parameterizations. □

Proposition A4.

Let G be a causal graph and

V

be its observed variables. A set of functional variables (

W

) is consistent with positivity constraints (

C_{V}

) if no single constraint in

C_{V}

mentions both

W \in W

and a set (

H_{W}

) that intercepts all directed paths from non-functional variables to W.

Proof of Proposition A4.

We construct a parameterization (

F

) and show that the distribution (Pr) induced by

F

satisfies

C_{V}

, which ensures consistency. The states of each variable (V) are represented in the form of

(s_{V}, p_{1}, \dots, p_{m})

, where

s_{V}

and

p_{i}

(

i \in {1, \dots, m}

) are both binary indicators (0 or 1). Specifically, each

p_{i}

corresponds to a “functional descendant path” of V defined as follows: A functional descendant path of V is a directed path that starts with V, and all variables on the path (excluding V) are functional. Suppose V does not have any functional descendant path; then, the states of V are simply represented as

(s_{V})

.

Next, we show how to assign CPTs for each variable in the causal graph (G) based on whether the variable is functional. For each non-functional variable, we assign a uniform distribution. For each functional variable (W) whose parents are

T_{1}, \dots, T_{n}

and whose functional descendant paths are

P_{1}, \dots, P_{m}

, we assign the CPT (

f_{W}

) as follows:

\begin{matrix} s_{W} & \leftarrow p_{I n d (T_{1}, W)}^{T_{1}} \oplus \dots \oplus p_{I n d (T_{n}, W)}^{T_{n}} \\ p_{1} & \leftarrow p_{1, 1}^{T_{1}} \oplus \dots \oplus p_{n, 1}^{T_{n}} \\ \dots \\ p_{m} & \leftarrow p_{1, m}^{T_{1}} \oplus \dots \oplus p_{n, m}^{T_{n}} \end{matrix}

(A1)

where

I n d (T_{i}, W)

denotes the index assigned to the path

{(T_{i}, W)}

(which contains a single edge) in the state of

T^{i}

and

p_{i, j}^{T_{i}}

denotes the indicator in the state of

T_{i}

for functional descendant path

P^{'}

that contains functional descendant path

P_{j}

, i.e.,

P^{'} = {(T_{i}, W)} \cup P_{j}

.

For simplicity, we call the set of variables (

H_{W}

) that satisfies the condition in the proposition a “functional ancestor set” of W. We show that

\Pr (S, Z) > 0

for each positivity constraint in the form of

\Pr (S | Z) > 0

. Let

W \subseteq S \cup Z

be a subset of functional variables. Since

S \cup Z

does not contain any functional ancestor set of W for each

W \in W

, it follows that there exist directed paths from a set of non-functional variables (

A_{W}^{'}

to W) that are unblocked by

M = S \cup Z ∖ {W}

and contain only functional variables (excluding

A_{W}^{'}

). We can further assume that

A_{W}^{'}

is chosen such that the set

A_{W} = M \cup A_{W}^{'}

forms a valid functional ancestor set for W. Next, we show that for any state (w) of W and instantiation (

m

) of

M

, there exists at least one instantiation (

a

) of

A_{W}^{'}

such that

\Pr (w, m, a) > 0

.

Let

P^{W}

denote the set of all directed paths from

A_{W}

to W that do not contain

A_{W}

(except for the first node on the path). Let

P_{1}^{W} \subseteq P^{W}

be the paths that start with a variable in

M

and

P_{2}^{W} \subseteq P^{W}

be other paths that start with a variable in

A_{W}^{'}

. Moreover, for any path (

P

), let

pathval (P)

be the binary indicator (e.g.,

p_{1}

) for

P

in the state of

P (0)

(first variable in

P

). Since the value assignments for

pathval (P)

are independent for different

P

s, we can always find some instantiation (

a \in A_{W}^{'}

) such that the following equality holds given w and

m

:

⨁_{P_{2} \in P_{2}^{W}} pathval (P_{2}) = s_{W} \oplus ⨁_{P_{1} \in P_{1}^{W}} pathval (P_{1})

We next assign values for other path indicators of

a

such that the indicators for the functional descendant paths in state w are set correctly. In particular, for each functional descendant path (

P

) of W, let

P

be the set of functional descendant paths of

A_{W}

that do not contain

A_{W}

(except for the first node on the path) and that contain

P

as a sub-path. Let

P_{1} \subseteq P

be the paths that start with a variable in

M

and

P_{2} \subseteq P

be other paths that start with a variable in

A_{W}^{'}

. Again, since all the indicators for paths in

P

are independent, we can assign the indicators for

a \in A_{W}^{'}

such that

⨁_{P_{2} \in P_{2}} pathval (P_{2}) = pathval (P) \oplus ⨁_{P_{1} \in P_{1}} pathval (P_{1}) .

Finally, we combine the cases for each individual

W \in W

by creating the following set:

A_{W} = ⋃_{W \in W} A_{W}

. Since all the functional descendant paths we considered for different Ws are disjoint, we can always find an assignment (

a

) for

A_{W}

that is consistent with the functional dependencies (does not produce any zero probabilities). Consequently, there must exist some full instantiation (

(u, v)

) compatible with

s

,

z

, and

a

such that

\Pr (u, v) > 0

, which implies

\Pr (s, z) > 0

. □

Proof of Proposition A1.

Suppose Y is not a child of X in the CBN; then,

\sum_{X} f_{X} f_{Y} = f_{Y} (\sum_{X} f_{X})

, which is guaranteed to be a CPT for Y. Suppose Y is a child of X. Let

P_{X}

denote the parents of X and

P_{Y}

denote the parents of Y, excluding X. The new factor (

g = \sum_{X} f_{X} f_{Y}

) is defined over

P_{X} \cup P_{Y} \cup {Y}

. Consider each instantiation (

p_{X}

and

p_{Y}

); then,

\sum_{y} g (p_{X}, p_{Y}, y) = \sum_{y} \sum_{x} f_{X} (x | p_{x}) f_{Y} (y | p_{Y}, x) = \sum_{x} f_{X} (x | p_{X}) \sum_{y} f_{Y} (y | p_{Y}, x) = 1

. Hence, g is a CPT for Y. □

Proof of Proposition A2.

Let X be the functional variablethat is functionally eliminated. By definition, the elimination only affects the CPTs for the children of X. Hence, any functional variable that is not a child of X remains functional. For each child (C) of X that is functional, the new CPT

(\sum_{X} f_{X} f_{C})

only contains values that are either 0 or 1, since both

f_{X}

and

f_{C}

are functional. □

Proof of Proposition A3.

First, note that

π_{2}

can always be obtained from

π_{1}

by a sequence of “transpositions”, where each transposition swaps two adjacent variables in the first sequence. Let

π

be an elimination order, and let

π^{'}

be the elimination order resulting from swapping

π_{i} = X

and

π_{i + 1} = Y

from the

π

, i.e.,

π = (\dots, X, Y, \dots) π^{'} = (\dots, Y, X, \dots)

We show that functional elimination according to

π

and

π^{'}

yields the same CBN, which can be applied inductively to conclude that elimination according to

π_{1}

and

π_{2}

yields the same CBN. Since

π

and

π^{'}

agree on the elimination order up to X, they yield the same CBN before eliminating variables

X, Y

. It suffices to show the CBNs resulting from eliminating

(X, Y)

those resulting from the and elimination of

(Y, X)

are the same. Let

⟨ G, \Pr ⟩

be the CBN before eliminating variables

X, Y

. Suppose X and Y do not belong to the same family (which contains a variable and its parents), the eliminations of X and Y are independent, and the order of elimination does not matter. Suppose X and Y belong to the same family; then, they are either parent and child or co-parents (X and Y are co-parents if they have a same child).

Without loss of generality, suppose X is a parent of Y. Eliminating

(X, Y)

and eliminating

(Y, X)

yield the same causal graph that is defined as follows. Each child (C) of Y has parents

P_{X} \cup P_{Y} \cup P_{C} ∖ {X, Y}

, and any other child C of X has parents

P_{X} \cup P_{C} ∖ {X}

. We next consider the CPTs. For each common child (C) of X and Y, its CPT resulting from eliminating

(X, Y)

is

f_{C}^{1} = \sum_{Y} (\sum_{X} f_{C} f_{X}) (\sum_{X} f_{Y} f_{X})

, and the CPT resulting from eliminating

(Y, X)

is

f_{C}^{2} = \sum_{X} f_{X} (\sum_{Y} f_{C} f_{Y})

. Since X is a parent of Y, we have

Y \notin f_{X}

and

\begin{matrix} f_{C}^{2} & = \sum_{X} \sum_{Y} f_{X} f_{C} f_{Y} = \sum_{Y} \sum_{X} f_{X} f_{C} f_{Y} \\ = \sum_{Y} (\sum_{X} f_{X} f_{C}) (\sum_{X} f_{X} f_{Y}) = f_{C}^{1} \end{matrix}

Next, we consider the case when C is a child of Y but not a child of X. The CPT for C resulting from eliminating

(X, Y)

is

f_{C}^{1} = \sum_{Y} f_{C} (\sum_{X} f_{Y} f_{X})

, and the CPT resulting from eliminating

(Y, X)

is

f_{C}^{2} = \sum_{X} f_{X} (\sum_{Y} f_{C} f_{Y})

. Again, since X is a parent of Y, we have

Y \notin f_{X}

and

\begin{matrix} f_{C}^{2} & = \sum_{X} \sum_{Y} f_{X} f_{C} f_{Y} = \sum_{Y} \sum_{X} f_{X} f_{C} f_{Y} \\ = \sum_{Y} f_{C} (\sum_{X} f_{X} f_{Y}) = f_{C}^{1} \end{matrix}

Finally, we consider the case when C is a child of X but not a child of Y. Regardless of the order of X and Y, the CPT for C resulting from eliminating X and Y is

(\sum_{X} f_{X} f_{C})

.

Next, we consider the case when X and Y are co-parents. Regardless of the order of X and Y, the causal graph resulting from the elimination satisfies the following properties: (1) for each common child (C) of X and Y, the parents for C are

P_{X} \cup P_{Y} \cup P_{C} ∖ {X, Y}

; (2) the parents of each C that is a child of X but not a child of Y are

P_{X} \cup P_{C} ∖ {X}

; and (3) the parents of each C that is a child of Y but not a child of X are

P_{Y} \cup P_{C} ∖ {Y}

. Next, we consider the CPTs. The CPT for each common child C of X and Y resulting from eliminating

(X, Y)

is

f_{C}^{1} = \sum_{Y} f_{Y} (\sum_{X} f_{X} f_{C})

, and the CPT resulting from eliminating

(Y, X)

is

f_{C}^{2} = \sum_{X} f_{X} (\sum_{Y} f_{Y} f_{C})

. Since X and Y are not parent and child, we have

X \notin f_{Y}

,

Y \notin f_{X}

, and

f_{C}^{1} = \sum_{Y} \sum_{X} f_{Y} f_{X} f_{C} = \sum_{X} \sum_{Y} f_{Y} f_{X} f_{C} = f_{C}^{2}

For each C that is a child of X but not a child of Y, regardless of the order of X and Y, the CPT for C resulting from eliminating variables X and Y is

(\sum_{X} f_{X} f_{C})

. A similar result holds for each C that is a child of Y but not a child of X. □

Proof of Theorem A1.

It suffices to show that

\Pr^{'} = \sum_{X} \Pr

when we eliminate a single variable (X). Let

F

denote the set of CPTs for

M

. Since

f_{X}

is a functional CPT for X, we can replicate

f_{X}

in

F

, which yields a new CPT set (replication) (

F^{'}

) that induces the same distribution as

F

(see details in ([21], Theorem 4)). Specifically, we pair the CPT for each child (C) of X with an extra copy of

f_{X}

, denoted as

(f_{X}, f_{C})

, which yields a list of pairs (

(f_{X}, f_{C_{1}}), \dots, (f_{X}, f_{C_{k}})

, where

C_{1}, \dots, C_{k}

are the children of X). Functionally eliminating X from

F^{'}

yields ([21], Corollary 1)

\begin{matrix} \sum_{X} \Pr = \sum_{X} F^{'} & = H \cdot (\sum_{X} f_{X} f_{C_{1}}) \dots (\sum_{X} f_{X} f_{C_{k}}) \\ = H \cdot f_{C_{1}}^{'} \dots f_{C_{k}}^{'} = \Pr^{'} \end{matrix}

(A2)

where

H

represent the CPTs in

F

that do not contain X and each

f_{C_{i}}^{'}

is the CPT for child

C_{i}

in

M^{'}

. □

Proof of Theorem A2

Lemma A1.

Consider a CBN

⟨ G, F ⟩

and its mutilated CBN

⟨ G_{x}, F_{x} ⟩

under

d o (x)

. Let W be a functional variable not in

X

and let

⟨ G^{'}, F^{'} ⟩

and

⟨ G_{x}^{'}, F_{x}^{'} ⟩

be the results of functionally eliminating W from

⟨ G, F ⟩

and

⟨ G_{x}, F_{x} ⟩

, respectively. Then,

⟨ G_{x}^{'}, F_{x}^{'} ⟩

is the mutilated CBN for

⟨ G^{'}, F^{'} ⟩

.

Proof.

First, observe that the children of W in G and

G_{x}

can only differ by the variables in

X

. Let

C_{1}

be the children of W in both G and

G_{x}

, and let

C_{2}

be the children of W in G but not in

G_{x}

. According to the definition of mutilated CBN, W has the same set of parents and CPT in

⟨ G, F ⟩

and

⟨ G_{x}, F_{x} ⟩

. Similarly, each child (

C \in C_{1}

) has the same set of parents and CPT in

⟨ G, F ⟩

and

⟨ G_{x}, F_{x} ⟩

. Hence, eliminating W yields the same set of parents and CPT for each

C \in C

in

⟨ G^{'}, F^{'} ⟩

and

⟨ G_{x}^{'}, F_{x}^{'} ⟩

. Next, we consider the set of parents and CPT for each child (

C \in C_{2}

). Since W is not a parent of C in

⟨ G_{x}, F_{x} ⟩

, variable C has the same set of parents and CPT in

⟨ G_{x}^{'}, F_{x}^{'} ⟩

. Exactly same set of parents (empty) and CPT are assigned to C in the mutilated CBN for

⟨ G^{'}, F^{'} ⟩

. □

Proof of Theorem A2.

Consider a CBN

⟨ G, F ⟩

and its mutilated CBN

⟨ G_{x}, F_{x} ⟩

. Let Pr and

\Pr_{x}

be the distributions induced by

F

and

F_{x}

over variables

V

, respectively. According to Lemma A1, we can eliminate each

W \in W

inductively from

⟨ G, F ⟩

and

⟨ G_{x}, F_{x} ⟩

and obtain

⟨ G^{'}, F^{'} ⟩

and its mutilated CBN

⟨ G_{x}^{'}, F_{x}^{'} ⟩

. According to Theorem A1, the distribution induced by

F_{x}^{'}

is exactly

\sum_{W} \Pr_{x} (V)

. □

Proof of Proposition 2.

First, note that the extended set (

Z^{'}

) contains

Z

and all variables that are functionally determined by

Z

. Consider any path (

P

) between some

X \in X

and

Y \in Y

. We show that

P

is blocked by

Z^{'}

iff it is blocked by

Z

according to the definition presented in [40]. We first show the if part. Suppose there is a convergent valve (see [38] (Ch. 4) for more details on convergent, divergent, and sequential valves) for a variable W that is closed when conditioned on

Z

; then, the valve is still closed when conditioned on

Z^{'}

unless the parents of W are in

Z^{'}

. However, the path (

P

) is blocked in the latter case, since the parents of W must have sequential/divergent valves. Suppose there is a sequential/divergent valve that is closed when conditioned on

Z

according to [40]; then, W must be in

Z^{'}

, since it is functionally determined by

Z

. Hence, the valve is also closed when conditioned on

Z^{'}

.

Next, we show the only–if part. Suppose a convergent valve for variable W is closed when conditioned on

Z^{'}

; then, none of

Z

is a descendant of W, since

Z^{'}

is a superset of

Z

. Suppose a sequential/divergent valve for variable W is closed when conditioned on

Z^{'}

; then, W is functionally determined by

Z

by the construction of

Z^{'}

. Thus, the valve is closed as described in [40]. □

Proof of Theorem 1.

By induction, it suffices to show that

X

and

Y

are D-separated by

Z

in

⟨ G, W ⟩

iff they are D-separated by

Z

in

⟨ G^{'}, W^{'} ⟩

, where

G^{'}

is the result of functionally eliminating a single variable (

T \in W

) from

G^{'}

and

W^{'} = W ∖ {T}

. We first show the contrapositive of the if part. Suppose

X

and

Y

are not D-separated by

Z

in

⟨ G, W ⟩

; owing to the completeness of D-separation, there exists a parameterization (

F

) on G such that

{(X / ⊥ ⊥ Y | Z)}_{F}

. If we eliminate T from the CBN

⟨ G, F ⟩

, we obtain another CBN

⟨ G^{'}, F^{'} ⟩

, where

F^{'}

is the parameterization for

G^{'}

. According to Theorem A1, the marginal probabilities are preserved for the variables in

G^{'}

, which include

X, Y, Z

. Hence,

{(X / ⊥ ⊥ Y | Z)}_{F^{'}}

and

X

and

Y

are not D-separated by

Z

in

⟨ G^{'}, W^{'} ⟩

.

Next consider the contrapositive of the only–if part. Suppose

X

and

Y

are not D-separated by

Z

in

⟨ G^{'}, W^{'} ⟩

; then, there exists a parameterization (

F^{'}

) of

G^{'}

such that

{(X / ⊥ ⊥ Y | Z)}_{F^{'}}

owing to the completeness of D-separation. We construct a parameterization (

F

) for G such that

F^{'}

is the parameterization of

G^{'}

, which results from eliminating T from the CBN

⟨ G, F ⟩

. This is sufficient to show that

X

and

Y

are not D-separated by

Z

in

⟨ G, W ⟩

, since the marginals are preserved by Theorem A1.

Construction Method Let $P_{T}$ and $C_{T}$ denote the parents and children of T in G, respectively. Our construction assumes that the cardinality of T is the number of instantiations for its parents ( $P_{T}$ ), that is, there is a one-to-one correspondence between the states of T and the instantiations of $P_{T}$ , and we use $α (t)$ to denote the instantiation ( $p_{T}$ ) corresponding to state t. The functional CPT for T is assigned as $f_{T} (t | p_{T}) = 1$ if $α (t) = p_{T}$ and $f_{T} (t | p_{T}) = 0$ otherwise for each instantiation ( $p_{T}$ ) of $P_{T}$ . Now, consider each child ( $C \in C_{T}$ ) that has parents $P_{C}$ (excluding T) and T in G. It immediately follows from Definition 8 that C has parents $P_{T} \cup P_{C}$ in $G^{'}$ . Next, we construct the CPT ( $f_{C}$ ) in $F$ based on its CPT ( $f_{C}^{'}$ ) in $F^{'}$ . Consider each parent instantiation $(t, p_{C})$ , where t is a state of T and $p_{C}$ is an instantiation of $P_{C}$ . If $α (t)$ is consistent with $p_{C}$ , $f_{C} (c | t, p_{C}) = f_{C}^{'} (c | α (t), p_{C})$ is assigned for each state (c) (For clarity, we use the notation | to separate a variable and its parents in a CPT). Otherwise, any functional distribution is assigned for $f_{C} (C | t, p_{C})$ . This construction ensures that the constructed CPT ( $f_{T}$ ) for T is functional and that the functional dependencies among other variables are preserved. In particular, for each child (C) of T, the constructed CPT ( $f_{C}$ ) is functional iff $f_{C}^{'}$ is functional. This construction method is reused later in other proofs.

Now, we just need to show that CBN

⟨ G^{'}, F^{'} ⟩

is the result of eliminating T from the (constructed) CBN

⟨ G, F ⟩

. In particular, we need to check that the CPT for each child (

C \in C_{T}

) in

F^{'}

is correctly computed from the constructed CPTs in

F

. For each instantiation

(p_{T}, p_{C})

and state (c) of C,

\begin{matrix} f_{C}^{'} (c | p_{T}, p_{C}) & = f_{C} (c | t^{*}, p_{C}) = f_{T} (t^{*} | p_{T}) f_{C} (c | t^{*}, p_{C}) \\ = \sum_{t} f_{T} (t | p_{T}) f_{C} (c | t, p_{C}) \end{matrix}

where

t^{*}

is the state of T such that

α (t^{*}) = p_{T}

. □

Proof of Theorem 2.

We prove the theorem by induction. It suffices to show the following statement: For each causal graph (G) with observed variables (

V

) and functional variables (

W

), the causal effect (

\Pr_{x} (Y)

) is F-identifiable with respect to

⟨ G, V, C_{V}, W ⟩

iff it is F-identifiable with respect to

⟨ G^{'}, V, C_{V}, W^{'} ⟩

, where

G^{'}

is the result of functionally eliminating some hidden functional variable (

T \in W

) and

W^{'} = W ∖ {T}

.

We first show the contrapositive of the if part. Suppose

\Pr_{x} (Y)

is not F-identifiable with respect to

⟨ G, V, C_{V}, W ⟩

; then, there exist two CBNs (

⟨ G, F_{1} ⟩

and

⟨ G, F_{2} ⟩

) that induce distributions (

\Pr_{1}, \Pr_{2}

) such that

\Pr_{1} (V) = \Pr_{2} (V)

but

\Pr_{1 x} (Y) \neq \Pr_{2 x} (Y)

. Let

⟨ G^{'}, F_{1}^{'} ⟩

and

⟨ G^{'}, F_{2}^{'} ⟩

be the results of eliminating

T \notin V

from

⟨ G, F_{1} ⟩

and

⟨ G, F_{2} ⟩

, respectively; the two CBNs attain the same marginal distribution on

V

but different causal effects according to Theorems A1 and A2. Hence,

\Pr_{x} (Y)

is not F-identifiable with respect to

⟨ G^{'}, V, C_{V}, W^{'} ⟩

either.

Next, we show the contrapositive of the only–if part. Suppose

\Pr_{x} (Y)

is not F-identifiable with respect to

⟨ G^{'}, V, C_{V}, W^{'} ⟩

; then, there exist two CBNs (

⟨ G^{'}, F_{1}^{'} ⟩

and

⟨ G^{'}, F_{2}^{'} ⟩

) that induce distributions (

\Pr_{1}^{'}, \Pr_{2}^{'}

) such that

\Pr_{1}^{'} (V) = \Pr_{2}^{'} (V)

but

\Pr_{1 x}^{'} (Y) \neq \Pr_{2 x}^{'} (Y)

. We can obtain

⟨ G, F_{1} ⟩

and

⟨ G, F_{2} ⟩

by, agains, considering the construction method outlined in Theorem 1, where we assign a one-to-one mapping for T and adopt the CPTs from

F_{1}^{'}

and

F_{2}^{'}

for the children of T. This way,

⟨ G^{'}, F_{1}^{'} ⟩

and

⟨ G^{'}, F_{2}^{'} ⟩

become the results of eliminating T from the constructed

⟨ G, F_{1} ⟩

and

⟨ G, F_{2} ⟩

, respectively. Since

T \notin V

,

\Pr_{1} (V) = \Pr_{1}^{'} (V) = \Pr_{2}^{'} (V) = \Pr_{2} (V)

according to Theorem A1 and

\Pr_{1 x} (Y) = \Pr_{1 x}^{'} (Y) \neq \Pr_{2 x}^{'} (Y) = \Pr_{2 x} (Y)

according to Theorem A2. Hence,

\Pr_{x} (Y)

is not F-identifiable with respect to

⟨ G, V, C_{V}, W ⟩

either. □

Proof of Theorem 3.

Since we only functionally eliminate variables that have observed parents, it is guaranteed that each

Z \in Z

has observed parents when it is eliminated. By induction, it suffices to show that

\Pr_{x} (Y)

is F-identifiable with respect to

⟨ G, V, C_{V}, W ⟩

iff it is F-identifiable with respect to

⟨ G^{'}, V^{'}, C_{V}, W^{'})

, where

G^{'}

is the result of eliminating a single functional variable (

Z \in W

) with observed parents from G,

V^{'} = V ∖ {Z}

, and

W^{'} = W ∖ {Z}

.

We first show the contrapositive of the if part. Suppose

\Pr_{x} (Y)

is not F-identifiable with respect to

⟨ G, V, C_{V}, W ⟩

; then, there exist two CBNs (

⟨ G, F_{1} ⟩

and

⟨ G, F_{2} ⟩

) that induce distributions

\Pr_{1}, \Pr_{2}

, where

\Pr_{1} (V) = \Pr_{2} (V)

but

\Pr_{1 x} (Y) \neq \Pr_{2 x} (Y)

. Let

⟨ G^{'}, F_{1}^{'} ⟩

and

⟨ G^{'}, F_{2}^{'} ⟩

be the results of eliminating Z from

⟨ G, F_{1} ⟩

and

⟨ G, F_{2} ⟩

, respectively; the two CBNs induce the same marginal distribution (

\Pr_{1}^{'} (V^{'}) = \Pr_{2}^{'} (V^{'})

) according to Theorem A1 but different causal effects (

\Pr_{1 x}^{'} (Y) \neq \Pr_{2 x}^{'} (Y)

) according to Theorem A2. Hence,

\Pr_{x} (Y)

is not F-identifiable with respect to

⟨ G^{'}, V^{'}, C_{V}, W^{'} ⟩

.

We now consider the contrapositive of the only–if part. Suppose

\Pr_{x} (Y)

is not F-identifiable with respect to

⟨ G^{'}, V^{'}, C_{V}, W^{'} ⟩

; then, there exist two CBNs (

⟨ G^{'}, F_{1}^{'} ⟩

and

⟨ G^{'}, F_{2}^{'} ⟩

) that induce distributions (

\Pr_{1}^{'}, \Pr_{2}^{'}

) such that

\Pr_{1}^{'} (V^{'}) = \Pr_{2}^{'} (V^{'})

but

\Pr_{1 x}^{'} (Y) \neq \Pr_{2 x}^{'} (Y)

. We, again, consider the construction method from the proof of Theorem 1, which produces two CBNs (

⟨ G, F_{1} ⟩

and

⟨ G, F_{2} ⟩

). Moreover,

⟨ G^{'}, F_{1}^{'} ⟩

and

⟨ G^{'}, F_{2}^{'} ⟩

are the results of eliminating Z from

⟨ G, F_{1} ⟩

and

⟨ G, F_{2} ⟩

, respectively. It is guaranteed that the two constructed CBNs produce different causal effects (

\Pr_{1 x} (Y) = \Pr_{1 x}^{'} (Y) \neq \Pr_{2 x}^{'} (Y) = \Pr_{2 x} (Y)

) according to Theorem A2. We need to show that

⟨ G, F_{1} ⟩

and

⟨ G, F_{2} ⟩

induce the same distribution over variables of

V = V^{'} \cup {Z}

. Consider any instantiation

(v^{'}, z)

of

V

. Since

F_{1}

and

F_{2}

assign the same one-to-one mapping (

α

) between Z and its parents in G, it is guaranteed that the probabilities are

\Pr_{1} (v^{'}, z) = \Pr_{2} (v^{'}, z) = 0

, except for the single state (

z^{*}

) where

α (z^{*}) = p

, where

p

is the parent instantiation of Z, consistent with

v^{'}

. By construction,

\Pr_{1} (v^{'}, z^{*}, u) = \Pr_{1}^{'} (v^{'}, u)

for every instantiation (

u

) of hidden variables (

U

). Similarly,

\Pr_{2} (v^{'}, z^{*}, u) = \Pr_{2}^{'} (v^{'}, u)

for every instantiation

(v^{'}, z, u)

. It then follows that

\Pr_{1} (v^{'}, z^{*}) = \sum_{u} \Pr_{1} (v^{'}, z^{*}, u) = \sum_{u} \Pr_{1}^{'} (v^{'}, u) = \Pr_{1}^{'} (v^{'}) = \Pr_{2}^{'} (v^{'}) = \sum_{u} \Pr_{2}^{'} (v^{'}, u) = \sum_{u} \Pr_{2} (v^{'}, z^{*}, u) = \Pr_{2} (v^{'}, z^{*})

. This means

\Pr_{x} (Y)

is not F-identifiable with respect to

⟨ G, V, C_{V}, W ⟩

either. □

Proof of Theorem 4

Lemma A2.

Let G be a causal graph,

V

be its observed variables, and

W

be its functional variables. Let Z be a non-descendant of

W

that has at least one hidden parent; then, a causal effect is F-identifiable with respect to

⟨ G, V, C_{V}, W ⟩

iff it is F-identifiable with respect to

⟨ G, V, C_{V}, W \cup {Z} ⟩

.

Proof.

Let

W^{'}

denote the set expressed as

W \cup {Z}

. The only–if part holds immediately due to the fact that every distribution that can possibly be induced from

⟨ G, W^{'} ⟩

can also be induced from

⟨ G, W ⟩

. Next, we consider the contrapositive of the if part. Suppose a causal effect is not F-identifiable with respect to

⟨ G, V, C_{V}, W ⟩

; then, there exist two CBNs (

⟨ G, F_{1} ⟩

and

⟨ G, F_{2} ⟩

) that induce distributions (

\Pr_{1}, \Pr_{2}

) such that

\Pr_{1} (V) = \Pr_{2} (V)

but

\Pr_{1 x} (Y) \neq \Pr_{2 x} (Y)

. Next, we construct

⟨ G, F_{1}^{″} ⟩

and

⟨ G, F_{2}^{″} ⟩

, which constitute an example of unidentifiability with respect to

⟨ G, V, C_{V}, W^{'} ⟩

. In particular, the CPTs for Z need to be functional in the constructed CBNs.

Without losing generality, we show the construction of

⟨ G, F_{1}^{''} ⟩

from

⟨ G, F_{1} ⟩

, which involves two steps (the construction of

⟨ G, F_{2}^{″} ⟩

from

⟨ G, F_{2} ⟩

follows the same procedure). Let

P

be the parents of Z. The first step constructs a CBN

⟨ G^{'}, F_{1}^{'} ⟩

based on the known method that transforms any (non-functional) CPT into a functional CPT. This is done by adding an auxiliary hidden root parent (U) for Z whose states correspond to the possible functions between

P

and Z. The CPTs for U and Z are assigned accordingly such that

f_{Z} = \sum_{U} f_{U}^{'} f_{Z}^{'}

, where

f_{U}^{'}

and

f_{Z}^{'}

are the constructed CPTs in

F_{1}^{'}

(Each state (u) of U corresponds to a function (

γ_{u}

), where

γ_{u} (p)

is mapped to some state of Z for each instantiation (

p

). Thus, variable U has

{| Z |}^{| P |}

states, since there is a total of

{| Z |}^{| P |}

possible functions from

P

to Z. For each instantiation

(z, u, p)

, the functional CPT for Z is defined as

f_{Z}^{'} (z | u, p) = 1

if

z = γ_{u} (p)

and

f_{Z}^{'} (z | u, p) = 0

otherwise. The CPT for U is assigned as

f_{U}^{'} (u) = \prod_{p \in P} f_{Z} (γ_{u} (p) | p)

). It follows that

F_{1}

and

F_{1}^{'}

induce the same distribution over

V

, since

F_{1} = \sum_{U} F_{1}^{'}

. The causal effect is also preserved, since the summing out of U is independent of other CPTs in the mutilated CBN for

⟨ G^{'}, F_{1}^{'} ⟩

.

Our second step involves converting the CBN

⟨ G^{'}, F_{1}^{'} ⟩

(constructed in the first step) into the CBN

⟨ G, F_{1}^{″} ⟩

over the original graph (G). Let

T \in P

be the hidden parent of W in G. We merge the auxiliary parents (U and T) into a new variable (

T^{'}

) and substitute it for T in G, i.e.,

T^{'}

has the same parents and children as T.

T^{'}

is constructed as the Cartesian product of U and T. Each state of

T^{'}

is represented as a pair

(u, t)

, where u is a state of U and t is a state of T. We are ready to assign new CPTs for

T^{'}

and its children. For each parent instantiation (

p

) of

P

and each state (

(u, t)

) of

T^{'}

, we assign the CPT for

T^{'}

in

F^{″}

as

f_{T^{'}}^{″} ((u, t) | p) = f_{U}^{'} (u) f_{T}^{'} (t | p)

. Next, consider each child (C) of

T^{'}

that has parents (

P_{C}

) (excluding

T^{'}

). For each instantiation (

p_{C}

) of

P_{C}

and each state

(u, t)

of

T^{'}

, we assign the CPT for C in

F^{″}

as

f_{C}^{″} (c | p_{C}, (u, t)) = f_{C}^{'} (c | p_{C})

for each state (c). Note that

f_{C}^{″}

is functional iff

f_{C}^{'}

is functional. Hence, the CPTs for

W

are all functional in

F^{″}

.

We need to show that

⟨ G, F_{1}^{″} ⟩

preserves the distribution on

V

and the causal effect from

⟨ G^{'}, F_{1}^{'} ⟩

. Let

U^{″}

be the hidden variables in

⟨ G, F_{1}^{″} ⟩

and

U^{'}

be the hidden variables in

⟨ G^{'}, F_{1}^{'} ⟩

. The distribution on

V

is preserved, since there is a one-to-one correspondence between each instantiation

(v, u^{″})

in

⟨ G, F_{1}^{″} ⟩

and each instantiation

(v, u^{'})

in

⟨ G^{'}, F_{1}^{'} ⟩

, where the two instantiations agree on

v

and are assigned with the same probability, i.e.,

\Pr_{1}^{″} (v, u^{″}) = \Pr_{1}^{'} (v, u^{'})

. Hence,

\Pr_{1}^{″} (v) = \sum_{u^{″}} \Pr_{1}^{″} (v, u^{″}) = \sum_{u^{'}} \Pr_{1}^{'} (v, u^{'}) = \Pr_{1}^{'} (v)

for every instantiation (

v

). The preservation of the causal effect can be shown similarly but on the mutilated CBNs. Thus,

⟨ G, F_{1}^{″} ⟩

preserves both the distribution on

V

and the causal effect from

⟨ G, F_{1} ⟩

. Similarly, we can construct

⟨ G, F_{2}^{″} ⟩

, which preserves the distribution on

V

and the causal effect from

⟨ G, F_{2} ⟩

. The two CBNs (

⟨ G, F_{1}^{″} ⟩

and

⟨ G, F_{2}^{″} ⟩

) constitute an example of unidentifiability with respect to

⟨ G, V, C_{V}, W^{'} ⟩

. □

Proof of Theorem 4.

We prove the theorem by induction. We first order all the functional variables in a bottom-up order. Let

W^{i}

denote the ith functional variable in the order and

W^{(i)}

denote the functional variables that are ordered before

W^{i}

(including

W^{i}

). It follows that we can go over each

W^{i}

in the order and show that a causal effect is F-identifiable with respect to

⟨ G, V, C_{V}, W^{(i - 1)} ⟩

iff it is F-identifiable with respect to

⟨ G, V, C_{V}, W^{(i)} ⟩

according to Lemma A2. Since F-identifiability with respect to

⟨ G, V, C_{V}, W^{(0)} ⟩

is equivalent to identifiability with respect to

⟨ G, V, C_{V} ⟩

, we conclude that the causal effect is F-identifiable with respect to

⟨ G, V, C_{V}, W ⟩

iff it is identifiable with respect to

⟨ G, V, C_{V} ⟩

. □

Proof of Theorem 5

The proof of the theorem is organized as follows. We start with a lemma (Lemma A3) that allows us to modify the CPT of a variable when the marginal probability over its parents contains zero entries. We then show a main lemma (Lemma A4) that allows us to reduce F-identifiability to identifiability when all functional variables are observed or have a hidden parent. Finally, we prove the theorem based on the main lemma and previous theorems.

Lemma A3.

Consider two CBNs that have the same causal graph and induce distributions

\Pr_{1}

and

\Pr_{2}

. Suppose the CPTs of the two CBNs only differ by

f_{X} (X, P)

. Then,

\Pr_{1} (p) = 0

iff

\Pr_{2} (p) = 0

for all instantiations (

p

) of

P

.

Proof.

Let

f_{Y}

and

g_{Y}

denote the CPTs for Y in the first and second CBN, respectively. Let

A n (P)

be the ancestors of variables in

P

(including

P

). If we eliminate all variables other than

A n (P)

, then we obtain the factor set expressed as

F_{1} = \prod_{Y \in A n (P)} f_{Y}

for the first CBN and that expressed as

F_{2} = \prod_{Y \in A n (P)} g_{Y}

for the second CBN. Since all CPTs are the same for variables in

A n (P)

, it is guaranteed that

F_{1} = F_{2}

. If we further eliminate variables other than

P

from

F_{1}

and

F_{2}

, we obtain marginal distributions of

\Pr_{1} (P) = {\sum^{=}}_{P} F_{1}

and

\Pr_{2} (P) = {\sum^{=}}_{P} F_{2}

, where

{\sum^{=}}_{P}

denotes the projection operation that sums out variables other than

P

from a factor. Hence,

\Pr_{1} (P) = \Pr_{2} (P)

, which concludes the proof. □

Lemma A4.

If a causal effect is F-identifiable with respect to

⟨ G, V, C_{V}, W ⟩

but is not identifiable with respect to

⟨ G, V, C_{V} ⟩

, then there must exist at least one functional variable that is hidden and whose parents are all observed.

Proof.

The lemma is the same as saying that if every functional variable is observed or has a hidden parent, then F-identifiability is equivalent to identifiability. We go over each functional variable (

W_{i} \in W

) in a bottom-up order (

Π

) and prove the following inductive statement: A causal effect (

\Pr_{x} (Y)

) is F-identifiable with respect to

⟨ G, V, C_{V}, W^{(i)} ⟩

iff it is F-identifiable with respect to

⟨ G, V, C_{V}, W^{(i - 1)} ⟩

, where

W^{(i)}

is a subset of variables in

W

that are ordered before

W_{i}

(and including

W_{i}

) in

Π

. Note that

W^{(0)} = \emptyset

and F-identifiability with respect to

⟨ G, V, C_{V}, W^{(0)} ⟩

collapses into identifiability with respect to

⟨ G, V, C_{V} ⟩

.

The if part follows from the definitions of identifiability and F-identifiability. Next, we consider the contrapositive of the only–if part. Let Z be the functional variable in

W

that is considered in the current inductive step. Let

⟨ G, F_{1} ⟩

and

⟨ G, F_{2} ⟩

be the two CBNs inducing distributions

\Pr_{1}

and

\Pr_{2}

, which constitute the unidentifiability, i.e.,

\Pr_{1} (V) = \Pr_{2} (V)

and

\Pr_{1 x} (Y) \neq \Pr_{2 x} (Y)

. Our goal is to construct two CBNs (

⟨ G, F_{1}^{‴} ⟩

and

⟨ G, F_{2}^{‴} ⟩

) that induce distributions

\Pr_{1}^{‴}

and

\Pr_{2}^{‴}

and contain functional CPTs for Z such that

\Pr_{1}^{‴} (V) = \Pr_{2}^{‴} (V)

and

\Pr_{1 x}^{‴} (Y) \neq \Pr_{2 x}^{‴} (Y)

. Suppose Z has a hidden parent; we directly employ Lemma A2 to construct the two CBNs. Next, we consider the case when Z is observed and has observed parents. By default, we use

f_{Z}

to

g_{Z}

to denote the CPTs for Z in

F_{1}

and

F_{2}

.

The following three steps are considered to construct an instance of unidentifiability.

First Step: We construct $⟨ G, F_{1}^{'} ⟩$ and $⟨ G, F_{2}^{'} ⟩$ by modifying the CPTs for Z. Let $P$ be the parents of Z in G. For each instantiation ( $p$ ) of $P$ where $\Pr_{1} (p) = \Pr_{2} (p) = 0$ , we modify entries $f_{Z}^{'} (Z | p)$ and $g_{Z}^{'} (Z | p)$ for CPTs $f_{Z}^{'}$ and $g_{Z}^{'}$ in $F_{1}^{'}$ and $F_{2}^{'}$ as follows. Since $\Pr_{1 x} (Y) \neq \Pr_{2 x} (Y)$ , there exists an instantiation ( $y$ ) such that $\Pr_{1 x} (y) \neq \Pr_{2 x} (y)$ . Without losing generality, assume $\Pr_{1 x} (y) > \Pr_{2 x} (y)$ . Since $\Pr_{1 x} (y)$ is computed as the marginal probability of $y$ in the mutilated CBN for $⟨ G, F_{1} ⟩$ , it can be expressed in the form of a network polynomial as shown in [38,47]. If we treat the CPT entries of $f_{Z} (Z | p)$ as unknown, then we can write $\Pr_{1 x} (y)$ as follows:

$\Pr_{1 x} (y) = α_{0} + α_{1} f_{Z} (z_{1} | p) + \dots + α_{k} f_{Z} (z_{k} | p)$

where $α_{0}, α_{1}, \dots, α_{k}$ are constants and $z_{1}, \dots, z_{k}$ are the states of variable Z. Similarly, we can write $\Pr_{2 x} (y)$ as follows:

$\Pr_{2 x} (y) = β_{0} + β_{1} g_{Z} (z_{1} | p) + \dots + β_{k} g_{Z} (z_{k} | p)$

Let

α_{i}

be the maximum value among

α_{1}, \dots, α_{k}

and

β_{j}

be the minimum value among

β_{1}, \dots, β_{k}

; our construction method assigns

f_{Z}^{'} (z_{i} | p) = 1

and

g_{Z}^{'} (z_{j} | p) = 1

. By construction, it is guaranteed that

\Pr_{1 x}^{'} (y) - \Pr_{2 x}^{'} (y) \geq \Pr_{1 x} (y) - \Pr_{2 x} (y) > 0

, where

\Pr_{1 x}^{'} (y)

and

\Pr_{2 x}^{'} (y)

denote the causal effects under the updated CPTs

f_{Z}^{'} (Z | p)

and

g_{Z}^{'} (Z | p)

, respectively. We repeat the above procedure for all

p

, where

\Pr_{1} (p) = 0

, which yields the new CBNs (

⟨ G, F_{1}^{'} ⟩

and

⟨ G, F_{2}^{'} ⟩

) in which

f_{Z}^{'} (Z | p)

and

g_{Z}^{'} (Z | p)

are functional whenever

\Pr_{1} (p) = 0

. Next, we show that

F_{1}^{'}

and

F_{2}^{'}

(with the updated

f_{Z}^{'}

and

g_{Z}^{'}

) constitute an example of unidentifiability.

\Pr_{1 x}^{'} (Y) \neq \Pr_{2 x}^{'} (Y)

, since

\Pr_{1 x}^{'} (y) > \Pr_{2 x}^{'} (y)

for the particular instantiation (

y

). We are left to show that the distributions of

\Pr_{1}^{'}

and

\Pr_{2}^{'}

induced by

⟨ G, F_{1}^{'} ⟩

and

⟨ G, F_{2}^{'} ⟩

are the same over the observed variables (

V

). Consider each instantiation (

v

) of

V

and

p

of

P

, where

p

is consistent with

v

. If

\Pr_{1} (p) = 0

, then

\Pr_{1}^{'} (p) = \Pr_{2}^{'} (p) = 0

according to Lemma A3; thus,

\Pr_{1}^{'} (v) = \Pr_{2}^{'} (v) = 0

. Otherwise,

\Pr_{1}^{'} (v) = \Pr_{1} (v) = \Pr_{2} (v) = \Pr_{2}^{'} (v)

, since none of the CPT entries consistent with

v

was modified.

Second Step: We construct $⟨ G^{″}, F_{1}^{″} ⟩$ and $⟨ G^{″}, F_{2}^{″} ⟩$ from $⟨ G, F_{1}^{'} ⟩$ and $⟨ G, F_{2}^{'} ⟩$ , respectively, by introducing an auxiliary root parent for Z and assigning a functional CPT for Z. We add a root variable, denoted as R, to be an auxiliary parent of W, which specifies all possible functions from $P^{*}$ to Z, where $P^{*}$ contains all instantiations of $p^{*}$ , where $\Pr_{1}^{'} (p^{*}) = \Pr_{2}^{'} (p^{*}) > 0$ . Each state (r) of R corresponds to a function ( $φ_{r}$ ), where $φ_{r} (p^{*})$ is mapped to some state of Z for each instantiation ( $p^{*}$ ). Thus, the R variable has ${| Z |}^{| P^{*} |}$ states, since there is total of ${| Z |}^{| P^{*} |}$ possible functions from $P^{*}$ to Z. For each instantiation $(z, r, p)$ , if $p \in P^{*}$ , we define $f_{Z}^{″} (z | r, p) = 1$ if $z = φ_{r} (p)$ and $f_{Z}^{″} (z | r, p) = 0$ otherwise. If $p \notin P^{*}$ , we define $f_{Z}^{″} (z | r, p) = f_{Z}^{'} (z | p)$ . The CPT for R is assigned as $f_{R}^{″} (r) = \prod_{p \in P^{*}} f_{Z}^{'} (φ_{r} (p) | p)$ . Moreover, we remove all the states (r) of R, where $f_{R}^{″} (r) = 0$ , which ensures $f_{R}^{″} (r) > 0$ for all remaining r. We assign CPT $g_{Z}^{″}$ in $F_{2}^{″}$ in a similar way. Note that $f_{R}^{″} = g_{R}^{″}$ , since $f_{Z}^{'} (φ_{r} (p) | p) = \Pr_{1}^{'} (φ_{r} (p) | p) = \Pr_{2}^{'} (φ_{r} (p) | p) = g_{Z}^{'} (φ_{r} (p) | p)$ for each $p \in P^{*}$ (where $\Pr_{1}^{'} (p) = \Pr_{2}^{'} (p) > 0$ ).

Next, we show that

⟨ G^{″}, F_{1}^{″} ⟩

and

⟨ G^{″}, F_{2}^{″} ⟩

constitute unidentifiability. One key observation is that

f_{T}^{'} (t | p) = \sum_{r} f_{R}^{″} (r) f_{T}^{″} (t | p, r)

and

g_{T}^{'} (t | p) = \sum_{r} g_{R}^{″} (r) g_{T}^{″} (t | p, r)

. Consider each instantiation

(v, r)

that contains an instantiation (

p

) of

P

and the state (z) of Z. Suppose

\Pr_{1}^{'} (p) = 0

; then,

\Pr_{1}^{″} (p) = 0

, since the marginal distribution over

V

is preserved in

\Pr_{1}^{″}

. Hence,

\Pr_{1}^{″} (v, r) = \Pr_{2}^{″} (v, r) = 0

. Suppose

\Pr_{1}^{'} (p) \neq 0

; then,

\Pr_{1}^{″} (v, r) = (\prod_{V \in V ∖ {Z}} f_{V}^{'}) \cdot f_{R}^{″} (r)

when

z = φ_{r} (p)

and

\Pr_{1}^{″} (v, r) = 0

otherwise. Similarly,

\Pr_{2}^{″} (v, r) = (\prod_{V \in V ∖ {Z}} g_{V}^{'}) \cdot g_{R}^{″} (r)

when

z = φ_{r} (p)

and

\Pr_{2}^{″} (v, r) = 0

otherwise. In both cases,

\Pr_{1}^{″} (v, r) = \Pr_{2}^{″} (v, r)

, since

F_{1}^{″}

and

F_{2}^{″}

assign the same function (

φ_{r}

) for each state (r) of R and

f_{V}^{'} = g_{V}^{'}

for all

V \in V ∖ {Z}

. With respect to

\Pr_{1 x}^{″} (Y) = \Pr_{1 x}^{'} (Y)

and

\Pr_{2 x}^{″} (Y) = \Pr_{2 x}^{'} (Y)

, note that summing out R from

\Pr_{1 x}^{″} (V, R)

and

\Pr_{2 x}^{″} (V, R)

yields

\Pr_{1 x}^{'} (V)

and

\Pr_{2 x}^{'} (V)

, respectively.

Third Step: We construct $⟨ G, F_{1}^{‴} ⟩$ and $⟨ G, F_{2}^{‴} ⟩$ from $⟨ G^{″}, F_{1}^{″} ⟩$ and $⟨ G^{″}, F_{2}^{″} ⟩$ , respectively, by merging the auxiliary root variable (R) with an observed parent (T) of Z. We merge R and T into a new node ( $T^{'}$ ) and substitute it for T in G, i.e., $T^{'}$ has the same parents and children as T in G. Specifically, $T^{'}$ is constructed as the Cartesian product of R and T. Each state of $T^{'}$ can be represented as $(r, t)$ , where r is a state of R and t is a state of T. We then assign the CPT ( $f_{T^{'}}^{‴} (T^{'} | P_{T})$ ) in $F_{1}^{‴}$ as follows. For each parent instantiation ( $p_{T^{'}}$ ) and each state $(r, t)$ of $T^{'}$ , $f_{T^{'}}^{‴} ((r, t) | p_{T^{'}}) = f_{R}^{″} (r) f_{T}^{″} (t | p_{T^{'}})$ . For each child (C) of $T^{'}$ that has parents ( $P_{C}$ ) (excluding $T^{'}$ ), the CPT for C in $F_{1}^{‴}$ is assigned as $f_{C}^{‴} (c | p_{C}, (r, t)) = f_{C}^{″} (c | p_{C}, t)$ . Similarly, we assign the CPTs ( $g_{T^{'}}^{‴}$ and $g_{C}^{‴}$ ) in $F_{2}^{‴}$ . The distributions over observed variables are preserved, since there is a one-to-one correspondence between the instantiations over $V \cup {R}$ in $⟨ G^{″}, F_{1}^{″} ⟩$ and the instantiations over $V$ in $⟨ G, F_{1}^{‴} ⟩$ . Next, we consider the causal effect. Suppose T is neither a treatment nor an outcome variable; then, merging does not affect the causal effect in a one-to-one correspondence between instantiations and $\Pr_{1 x}^{‴} (Y) = \Pr_{1 x}^{″} (Y) \neq \Pr_{2 x}^{″} (Y) = \Pr_{2 x}^{‴} (Y)$ . Suppose T is an outcome variable; since $\Pr_{1 x}^{″} (Y) \neq \Pr_{2 x}^{″} (Y)$ , there exists an instantiation $(y^{'}, r, t)$ where $Y^{'} = Y ∖ {T}$ such that $\Pr_{1 x}^{″} (y^{'}, r, t) \neq \Pr_{2 x}^{″} (y^{'}, r, t)$ . This implies $\Pr_{1 x}^{‴} (y^{'}, (r, t)) \neq \Pr_{2 x}^{‴} (y^{'}, (r, t))$ for the particular instantiation ( $y^{'}$ ) and the state $(r, t)$ of $T^{'}$ . Suppose T is the treatment variable (X); since $\Pr_{1 x}^{″} (Y) \neq \Pr_{2 x}^{″} (Y)$ , there exists an instantiation $(y, r)$ such that $\Pr_{1 x}^{″} (y, r) \neq \Pr_{2 x}^{″} (y, r)$ . Moreover, $\Pr_{1 x}^{″} (r) = \Pr_{1}^{″} (r) = \Pr_{2}^{″} (r) = \Pr_{2 x}^{″} (r) > 0$ (otherwise, $\Pr_{1 x}^{″} (y, r) = \Pr_{2 x}^{″} (y, r) = 0$ ). This implies $\Pr_{1 x}^{″} (y | r) \neq \Pr_{2 x}^{″} (y | r)$ . Since R is a root in the mutilated CBN, $\Pr_{1 (x r)}^{″} (y) = \Pr_{1 x}^{″} (y | r) \neq \Pr_{2 x}^{″} (y | r) = \Pr_{2 (x r)}^{″} (y)$ . We now consider treatment $d o (T^{'} = (r, x))$ on G instead of treatment $d o (R = r, X = x)$ on $G^{″}$ . We have $\Pr_{1 ((r, x))}^{‴} (y) = \Pr_{1 (r, x)}^{″} (y) \neq \Pr_{2 (r, x)}^{″} (y) = \Pr_{2 ((r, x))}^{‴} (y)$ for the particular state $(r, x)$ of $T^{'}$ . Moreover, $\Pr_{1}^{‴} ((r, x)) = \Pr_{1}^{″} (r, x) = \Pr_{1}^{″} (r) \Pr_{1}^{″} (x) > 0$ according to the positivity assumption of $\Pr_{1}^{″} (x)$ . Thus, the positivity still holds for $\Pr_{1}^{‴}$ and, similarly, for $\Pr_{2}^{‴}$ . □

Proof of Theorem 5.

Let

H = V^{'} ∖ V

be a set of hidden functional variables that are functionally determined by

V

. According to Theorem 2, the set is F-identifiable with respect to

⟨ G, V, C_{V}, W ⟩

iff it isF-identifiable with respect to

⟨ G^{'}, V, C_{V}, W ∖ H ⟩

, where

G^{'}

is the result of functionally eliminating

H

from G. By construction, every variable in

H

has parents in

V^{'}

. Hence, according to Theorem 3, a variable is F-identifiable with respec t to

⟨ G^{'}, V, C_{V}, W ∖ H ⟩

iff it is F-identifiable with respect to

⟨ G, V^{'}, C_{V}, W ⟩

. If we consider each functional variable (

W \in W

), it is either in

V^{'}

or has a parent that is not in

V^{'}

(otherwise, W would have been added to

V^{'}

). According to Lemma A4, a functional variable is F-identifiable with respect to

⟨ G, V^{'}, C_{V}, W ⟩

iff it is identifiable with respect to

⟨ G, V^{'}, C_{V} ⟩

. □

References

Pearl, J.; Mackenzie, D. The Book of Why: The New Science of Cause and Effect; Basic Books: New York, NY, USA, 2018. [Google Scholar]
Pearl, J. Causality: Models, Reasoning, and Inference, 2nd ed.; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
Pearl, J. Causal Diagrams for Empirical Research. Biometrika 1995, 82, 669–688. [Google Scholar] [CrossRef]
Spirtes, P.; Glymour, C.; Scheines, R. Causation, Prediction, and Search, 2nd ed.; Adaptive Computation and Machine Learning; MIT Press: Cambridge, MA, USA, 2000. [Google Scholar]
Imbens, G.W.; Rubin, D.B. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction; Cambridge University Press: Cambridge, UK, 2015. [Google Scholar]
Peters, J.; Janzing, D.; Schölkopf, B. Elements of Causal Inference: Foundations and Learning Algorithms; MIT Press: Cambridge, MA, USA, 2017. [Google Scholar]
Hernán, M.A.; Robins, J.M. Causal Inference: What If; Chapman & Hall/CRC: Boca Raton, FL, USA, 2020. [Google Scholar]
Pearl, J. [Bayesian Analysis in Expert Systems]: Comment: Graphical Models, Causality and Intervention. Stat. Sci. 1993, 8, 266–269. [Google Scholar] [CrossRef]
Huang, Y.; Valtorta, M. Identifiability in Causal Bayesian Networks: A Sound and Complete Algorithm. In Proceedings of the AAAI; AAAI Press: Menlo Park, CA, USA, 2006; pp. 1149–1154. [Google Scholar]
Shpitser, I.; Pearl, J. Identification of Joint Interventional Distributions in Recursive Semi-Markovian Causal Models. In Proceedings of the AAAI; AAAI Press: Menlo Park, CA, USA, 2006; pp. 1219–1226. [Google Scholar]
Zaffalon, M.; Antonucci, A.; Cabañas, R. Causal Expectation-Maximisation. arXiv 2020, arXiv:2011.02912. [Google Scholar]
Zaffalon, M.; Antonucci, A.; Cabañas, R.; Huber, D. Approximating counterfactual bounds while fusing observational, biased and randomised data sources. Int. J. Approx. Reason. 2023, 162, 109023. [Google Scholar] [CrossRef]
Darwiche, A. Causal Inference with Tractable Circuits. arXiv 2021, arXiv:2202.02891. [Google Scholar]
Huber, D.; Chen, Y.; Antonucci, A.; Darwiche, A.; Zaffalon, M. Tractable Bounding of Counterfactual Queries by Knowledge Compilation. In Proceedings of the Sixth Workshop on Tractable Probabilistic Modeling @ UAI 2023, Pittsburgh, PA, USA, 4 August 2023. [Google Scholar]
Jung, Y.; Tian, J.; Bareinboim, E. Estimating Causal Effects Using Weighting-Based Estimators. In Proceedings of the AAAI; AAAI Press: Menlo Park, CA, USA, 2020; pp. 10186–10193. [Google Scholar]
Jung, Y.; Tian, J.; Bareinboim, E. Learning Causal Effects via Weighted Empirical Risk Minimization. Adv. Neural Inf. Process. Syst. 2020, 33, 12697–12709. [Google Scholar]
Jung, Y.; Tian, J.; Bareinboim, E. Estimating Identifiable Causal Effects through Double Machine Learning. In Proceedings of the AAAI; AAAI Press: Menlo Park, CA, USA, 2021; pp. 12113–12122. [Google Scholar]
Jung, Y.; Tian, J.; Bareinboim, E. Estimating Identifiable Causal Effects on Markov Equivalence Class through Double Machine Learning. In Proceedings of the Machine Learning Research, Online, 18–24 July 2021; Volume 139, pp. 5168–5179. [Google Scholar]
Tikka, S.; Hyttinen, A.; Karvanen, J. Identifying Causal Effects via Context-specific Independence Relations. In Proceedings of the NeurIPS, Vancouver, BC, Canada, 8–14 December 2019; pp. 2800–2810. [Google Scholar]
Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference; Morgan Kaufmann: San Mateo, CA, USA, 1988. [Google Scholar]
Darwiche, A. An Advance on Variable Elimination with Applications to Tensor-Based Computation. In Proceedings of the 24th European Conference on Artificial Intelligence, Santiago de Compostela, Spain, 29 August–8 September 2020; Frontiers in Artificial Intelligence and Applications. IOS Press: Amsterdam, The Netherlands, 2020; Volume 325, pp. 2559–2568. [Google Scholar]
Chen, Y.; Choi, A.; Darwiche, A. Supervised Learning with Background Knowledge. In Proceedings of the 10th International Conference on Probabilistic Graphical Models (PGM), Skørping, Denmark, 23–25 September 2020. [Google Scholar]
Chen, Y.; Darwiche, A. On the Definition and Computation of Causal Treewidth. In Proceedings of the UAI, 38th Conference on Uncertainty in Artificial Intelligence, Eindhoven, The Netherlands, 1–5 August 2022. [Google Scholar]
Han, Y.; Chen, Y.; Darwiche, A. On the Complexity of Counterfactual Reasoning. arXiv 2022, arXiv:2211.13447. [Google Scholar]
Verma, T.S. Graphical Aspects of Causal Models; Technical Report; UCLA: Los Angeles, CA, USA, 1993; Volume R-191. [Google Scholar]
Tian, J.; Pearl, J. On the Testable Implications of Causal Models with Hidden Variables. In Proceedings of the UAI, Edmonton, AB, Canada, 1–4 August 2002; Morgan Kaufmann: San Francisco, CA, USA, 2002; pp. 519–527. [Google Scholar]
Chen, Y.; Darwiche, A. Identifying Causal Effects Under Functional Dependencies. In Proceedings of the NeurIPS, Vancouver, BC, Canada, 10–15 December 2024. [Google Scholar]
Tian, J.; Pearl, J. On the Identification of Causal Effects; Technical Report; UCLA: Los Angeles, CA, USA, 2003; Volume R-290-L. [Google Scholar]
Kivva, Y.; Mokhtarian, E.; Etesami, J.; Kiyavash, N. Revisiting the general identifiability problem. In Proceedings of the UAI, Eindhoven, The Netherlands, 1–5 August 2022; Proceedings of Machine Learning Research. Volume 180, pp. 1022–1030. [Google Scholar]
Hwang, I.; Choe, Y.; Kwon, Y.; Lee, S. On Positivity Condition for Causal Inference. In Proceedings of the ICML, Vienna, Austria, 21–27 July 2024. [Google Scholar]
Balke, A.; Pearl, J. Counterfactuals and Policy Analysis in Structural Models. In Proceedings of the UAI, San Francisco, CA, USA, 18–20 August 1995; Morgan Kaufmann: San Mateo, CA, USA, 1995; pp. 11–18. [Google Scholar]
Galles, D.; Pearl, J. An axiomatic characterization of causal counterfactuals. Found. Sci. 1998, 3, 151–182. [Google Scholar] [CrossRef]
Halpern, J.Y. Axiomatizing causal reasoning. J. Artif. Intell. Res. 2000, 12, 317–337. [Google Scholar] [CrossRef]
Halpin, T.A.; Morgan, T. Information Modeling and Relational Databases, 2nd ed.; Morgan Kaufmann: San Mateo, CA, USA, 2008. [Google Scholar]
Date, C.J. Database Design and Relational Theory—Normal Forms and All That Jazz; O’Reilly: Sebastopol, CA, USA, 2012. [Google Scholar]
Zhang, N.L.; Poole, D.L. Exploiting Causal Independence in Bayesian Network Inference. J. Artif. Intell. Res. 1996, 5, 301–328. [Google Scholar] [CrossRef]
Dechter, R. Bucket Elimination: A unifying framework for probabilistic inference. In Proceedings of the Twelfth Annual Conference on Uncertainty in Artificial Intelligence (UAI), Portland, OR, USA, 1–4 August 1996; pp. 211–219. [Google Scholar]
Darwiche, A. Modeling and Reasoning with Bayesian Networks; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
Geiger, D.; Pearl, J. On the logic of causal models. In Proceedings of the UAI, Minneapolis, MN, USA, 10–12 July 1988; North-Holland: Amsterdam, The Netherlands, 1988; pp. 3–14. [Google Scholar]
Geiger, D.; Verma, T.; Pearl, J. Identifying independence in bayesian networks. Networks 1990, 20, 507–534. [Google Scholar] [CrossRef]
Pearl, J. Fusion, Propagation, and Structuring in Belief Networks. Artif. Intell. 1986, 29, 241–288. [Google Scholar] [CrossRef]
Verma, T.; Pearl, J. Causal networks: Semantics and expressiveness. In Proceedings of the UAI, Minneapolis, MN, USA, 10–12 July 1988; North-Holland: Amsterdam, The Netherlands, 1988; pp. 69–78. [Google Scholar]
Tikka, S.; Karvanen, J. Enhancing Identification of Causal Effects by Pruning. J. Mach. Learn. Res. 2017, 18, 194:1–194:23. [Google Scholar]
van der Zander, B.; Liskiewicz, M.; Textor, J. Separators and adjustment sets in causal graphs: Complete criteria and an algorithmic framework. Artif. Intell. 2019, 270, 1–40. [Google Scholar] [CrossRef]
Erdős, P.; Rényi, A. On Random Graphs. Publ. Math. Debr. 1959, 6, 290–297. [Google Scholar] [CrossRef]
Shpitser, I.; Pearl, J. Complete Identification Methods for the Causal Hierarchy. J. Mach. Learn. Res. 2008, 9, 1941–1979. [Google Scholar]
Darwiche, A. A differential approach to inference in Bayesian networks. J. ACM 2003, 50, 280–305. [Google Scholar] [CrossRef]

Figure 1. Mutilated and projected graphs of a causal graph. Hidden variables are circled. A bidirected edge (

V_{1} ⤎ ⤏ V_{2}

) is aa compact notation for

V_{1} \leftarrow H \to V_{2}

, where H is an auxiliary hidden variable. (a) Causal graph; (b) mutilated graph; (c) projected graph.

Figure 2. Examples for positivity.

Figure 3. Contrasting projection with functional projection. C and D are functional. Hidden variables are circled. (a) DAG; (b) proj. (a) on A, B, G, H, I; (c) eliminate

C, D

from (a); (d) proj. (c) on A, B, G, H, I.

Figure 4. B is functional. (a) DAG; (b) projection.

Figure 5. Variables

A, B, C, F, X

, and Y are observed. Variables D and E are functional (and hidden). (a) Causal graph; (b) proj. of (a); (c) F-proj. of (a); (d) F-elim. F; (e) F-elim. B.

Table 1. Numbers of causal effects that are unidentifiable (uid) and that are unidentifiable but F-identifiable (uid-fid) and average number of observed variables passed to project-ID (#obs) for causal graphs with various numbers of variables (N) and functional ones (W).

N	W	uid	uid-fid	#obs
50	$0.25 N$	15	3	36.5
	$0.5 N$	15	10	32.6
	$0.75 N$	15	11	27.3
100	$0.25 N$	50	0	74.7
	$0.5 N$	50	6	68.1
	$0.75 N$	50	18	57.3
150	$0.25 N$	50	0	113.2
	$0.5 N$	50	2	103.6
	$0.75 N$	50	5	89.7

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Identifying Causal Effects Under Functional Dependencies

Abstract

1. Introduction

2. Technical Preliminaries

2.1. Causal Bayesian Networks and Interventions

2.2. Identifying Causal Effects

3. Constrained and Functional Identifiability

3.1. Positivity Constraints

3.2. Functional Dependencies

4. Functional Elimination and Projection

5. Causal Identification with Functional Dependencies

6. Experiments

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Appendix A. More on Projection and the ID Algorithm

Appendix A.1. Projection

Appendix A.2. ID Algorithm

Appendix B. Functional Elimination for CBNs

Appendix C. Proofs

References

Article Metrics

Citations

Article Access Statistics