-
Causal Effect Identification by Adjustment under
Confounding and Selection Biases
Juan D. Correa
Purdue [email protected]
Elias Bareinboim
Purdue [email protected]
Abstract
Controlling for selection and confounding biases are two ofthe
most challenging problems in the empirical sciences aswell as in
artificial intelligence tasks. Covariate adjustment(or, Backdoor
Adjustment) is the most pervasive techniqueused for controlling
confounding bias, but the same is obliv-ious to issues of sampling
selection. In this paper, we intro-duce a generalized version of
covariate adjustment that simul-taneously controls for both
confounding and selection biases.We first derive a sufficient and
necessary condition for recov-ering causal effects using covariate
adjustment from an ob-servational distribution collected under
preferential selection.We then relax this setting to consider cases
when additional,unbiased measurements over a set of covariates are
availablefor use (e.g., the age and gender distribution obtained
fromcensus data). Finally, we present a complete algorithm
withpolynomial delay to find all sets of admissible covariates
foradjustment when confounding and selection biases are
simul-taneously present and unbiased data is available.
Introduction
One of the central challenges in data-driven fields is to
com-pute the effect of interventions – for instance, how
increas-ing the educational budget will affect violence rates in a
city,whether treating patients with a certain drug will help
theirrecovery, or how increasing the product price will
changemonthly sales? These questions are commonly referred asthe
problem of identification of causal effects. There aretwo types of
systematic bias that pose obstacles to this kindof inference,
namely confounding bias and selection bias.The former refers to the
presence of a set of factors that af-fect both the action (also
known as treatment) and the out-come (Pearl 1993), while the latter
arises when the action,outcome, or other factors differentially
affect the inclusionof subjects in the data sample (Bareinboim and
Pearl 2016).
The goal of our analysis is to produce an unbiased esti-mand of
the causal effect, specifically, the probability dis-tribution of
the outcome when an action is performed by anautonomous agent
(e.g., FDA, robot), regardless of how thedecision would naturally
occur (Pearl 2000, Ch. 1). For ex-ample, consider the graph in Fig.
1(a) in which X represents
Copyright c� 2017, Association for the Advancement of
ArtificialIntelligence (www.aaai.org). All rights reserved.
a treatment (e.g., taking or not a drug), Y represents an
out-come (health status), and Z is a factor (e.g., gender, age)
thataffects both the propensity of being treated and the
outcome.The edges (Z,X) and (Z, Y ) may encode the facts
”genderaffects how the drug is being prescribed” and ”gender
af-fects recovery” respectively – for example, females may bemore
health conscious, so they seek for treatment more fre-quently than
their male counterparts and at the same timeare less likely to
develop large complications for the par-ticular disease.
Intuitively, the causal effect represents thevariations of X that
bring about change in Y regardless ofthe influence of Z on X ,
which is graphically representedin Fig. 1(b). Mutilation is the
graphical operation of remov-ing arrows representing a decision
made by an autonomousagent of setting a variable to a certain
value. The mathemat-ical counterpart of mutilation is the do()
operator and theaverage causal effect of X on Y is usually written
in termsof the do-distribution P (y | do(x)) (Pearl 2000, Ch.
1).
The gold standard for obtaining the do-distribution isthrough
the use of randomization, where the treatment as-signment is
selected by a randomized device (e.g., a coinflip) regardless of
any other set of covariates (Z). In fact,this operation physically
transforms the reality of the un-derlying population (Fig. 1(a))
into the corresponding mu-tilated world (Fig. 1(b)). The effect of
Z on X is neutral-ized once randomization is applied. Despite its
effective-ness, randomized studies can be prohibitively expensive,
andeven unattainable in certain cases, either for technical,
eth-ical, or technical reasons – e.g., one cannot randomize
thecholesterol level of a patient and record if it causes the
heartto stops, when trying to assess the effect of cholesterol
levelon cardiac failure.
An alternative way of computing causal effects istrying to
relate non-experimentally collected samples(drawn from P (z, x, y))
with the experimental distribution(P (y | do(x))). Non-experimental
(often called observa-tional) data relates to the model in Fig.
1(a) where sub-jects decide by themselves to take or not the drug
(X)while influenced by other factors (Z). There are a num-ber of
techniques developed for this task, where the mostgeneral one is
known as do-calculus (Pearl 1995). In prac-tice, one particular
strategy from do-calculus called adjust-ment is used the most. It
consists of averaging the ef-fect of X on Y over the different
levels of Z, isolating
TECHNICAL REPORT R-24-LFirst version: January 2017Revised: June
2017
-
the effect of interest from the effect induced by other
fac-tors. Controlling for confounding bias by adjustment
iscurrently the standard method for inferring causal effectsin
data-driven fields, and different properties and enhance-ments have
been studied in statistics (Rubin 1974; Robin-son and Jewell 1991;
Pirinen, Donnelly, and Spencer 2012;Mefford and Witte 2012) and AI
(Pearl 1993; 1995; Pearland Paz 2010; Shpitser, VanderWeele, and
Robins 2010;Maathuis and Colombo 2015; van der Zander,
Liskiewicz,and Textor 2014).
Orthogonal to confounding, sampling selection bias is in-duced
by preferential selection of units for the dataset, whichis usually
governed by unknown factors including treatment,outcome, and their
consequences. It cannot be removed bya randomized trial and may
stay undetected during the datagathering process, the whole study,
or simply never be de-tected1. Consider Fig. 1(e) where X and Y
represent againtreatment and outcome, but S represents a binary
variablethat indicates if a subject is included in the pool (S=1
meansthat the unit is in the sample, S=0 otherwise). The effect ofX
on Y in the entire population (P (y | do(x))) is usuallynot the
same as in the sample (P (y | do(x), S=1)). For in-stance, patients
that went to the hospital and were sampledare perhaps more affluent
and have better nutrition than theaverage person in the population,
which can lead to a fasterrecovery. This preferential selection of
samples challengesthe validity of inferences in several tasks in AI
(Cooper1995; Cortes et al. 2008; Zadrozny 2004) and
Statistics(Little and Rubin 1986; Kuroki and Cai 2006) as well asin
the empirical sciences (Heckman 1979; Angrist 1997;Robins
2001).
The problem of selection bias can be addressed by re-moving the
influence of the biased sampling mechanism onthe outcome as if a
random sample of the population wastaken. For the graph in Fig.
1(d), for example, the dis-tribution P (y | do(x)) is equal to P (y
| x, S=1) becausethere are not external factors that affect X and
the selec-tion mechanism S is independent of the outcome Y whenthe
effect is estimated for the treatment X . There exists acomplete
non-parametric2 solution for the problem of esti-mating statistical
quantities from selection biased datasets(Bareinboim and Pearl
2012), and also sufficient and algo-rithmic conditions for
recovering from selection in the con-text of causal inference
(Bareinboim, Tian, and Pearl 2014;Bareinboim and Tian 2015).
Both confounding and selection biases carry extrane-ous “flow”
of information between treatment and outcome,which is usually
deemed “spurious correlation” since it doesnot correspond to the
effect we want to compute on. De-spite all the progress made in
controlling these biases sep-arately, we show that to estimate
causal effects consider-ing both problems requires a more refined
analysis. First,note that the effect of X on Y can be estimated by
block-ing confounding and controlling for selection,
respectively,
1(Zhang 2008) noticed some interesting cases where detectionis
feasible in a class of non-chordal graphs.
2No assumptions about the about the functions that relates
vari-ables are made (i.e. linearity, monotonicity).
X Y
Z
(a)
X Y
Z
(b)
X Y
U
(c)
X Y
S
(d)
X Y
S
(e)
X Y
Z
S
(f)
Figure 1: (a) and (d) give simple examples for confound-ing and
selection bias respectively. (b) represents the modelin (a) after
an intervention is performed on X . (c) and (e)present examples
where confounding and selection bias cannot be removed
respectively. In (f) we can control for ei-ther confounding or
selection bias, but not for both unlesswe have external data on P
(z).
in Figs. 1(a) and (d). On the other hand, confounding cannotbe
removed in Fig. 1(c) nor it can be recovered from selec-tion bias
in Fig. 1(e). Perhaps surprisingly, Fig. 1(f) presentsa scenario
where either confounding or selection can be ad-dressed separately
(P (y|do(x)) =
P
Z
P (y|x, z)P (z) andP (z, y|do(x)) = P (z, y|do(x), S=1)), but
not simultane-ously (without external data). As this example
suggests,there is an intricate connection between these two biases
thatdisallow the methods developed for these problems of
beingapplied independently and then combined.
In this paper, we study the problem of estimating causaleffects
from models with an arbitrary structure that involveboth biases. We
establish necessary and sufficient conditionsthat a set of
variables should fulfill so as to guarantee that thetarget effect
can be unbiasedly estimated by adjustment. Weconsider two settings
– first when only biased data is avail-able, and then a more
relaxed setting where additional unbi-ased samples of covariates
are available for use (e.g., censusdata). Specifically, we solved
the following problems:
1. Identification and recoverability without externaldata: The
data is collected under selection bias,P (v | S=1), when does a set
of covariates Z allowP (y | do(x)) to be estimated by adjusting for
Z?
2. Identification and recoverability with external data:The data
is collected under selection bias P (v | S=1) andunbiased samples
of P (t),T ✓ V, are available. Whendoes a set of covariates Z ✓ T
license the estimation ofP (y | do(x)) by adjusting for Z?
3. Finding admissible adjustment sets with external data:How can
we list all admissible sets Z capable of identify-ing and
recovering P (y | do(x)), for Z ✓ T ✓ V?
Preliminaries
The systematic analysis of confounding and selection
biasesrequires a formal language where the characterization of
theunderlying data-generating model can be encoded explicitly.
-
We use the language of Structural Causal Models (SCM)(Pearl
2000, pp. 204-207). Formally, a SCM M is a 4-tuplehU, V, F, P (u)i,
where U is a set of exogenous (latent) vari-ables and V is a set of
endogenous (measured) variables. Frepresents a collection of
functions F = {f
i
} such that eachendogenous variable V
i
2 V is determined by a functionfi
2 F , where fi
is a mapping from the respective domainof U
i
[ Pai
to Vi
, Ui
✓ U , Pai
✓ V \Vi
(where Pai
is theset of endogenous variables that are arguments of f
i
), andthe entire set F forms a mapping from U to V . The
uncer-tainty is encoded through a probability distribution over
theexogenous variables, P (u). Within the structural
semantics,performing an action X=x is represented through the
do-operator, do(X=x), which encodes the operation of replac-ing the
original equation of X by the constant x and inducesa submodel
M
x
. For a detailed discussion on the propertiesof structural
models, we refer readers to (Pearl 2000, Ch. 7).
We will represent sets of variables in bold. The causaleffect of
a set X when it is assigned a set of values x,on a set Y when it is
instantiated as y will be writ-ten as P (y | do(x)), which is a
short hand notation forP (Y=y | do(X=x)). Mainly, we will operate
with P (v),P (v | do(x)), P (v | S=1), respectively, the
observational,experimental, and selection-biased distributions.
Formally, the task of estimating a probabilistic quantityfrom a
selection-biased distribution is known as recoveringfrom selection
bias (Bareinboim and Pearl 2012). It is notuncommon for
observations of a subset of the variables overthe entire population
(unbiased data) to be available for use.Therefore, we will consider
two subsets of V, M,T ✓ V,where M contains the variables for which
data was collectedunder selection bias, and T encompasses the
variables ob-served in the overall population, without bias. The
absenceof unbiased data is equivalent to have T = ;.
Selection Bias with Adjustment
The main justification for the validity of adjustment
forconfounding comes under a graphical conditions called
the“Backdoor criterion” (Pearl 1993; 2000), as shown below:
Definition 1 (Backdoor Criterion (Pearl 2000)). A set
ofvariables Z satisfies the Backdoor Criterion relative to a pairof
variables (X,Y ) in a directed acyclic graph G if:
(i) No node in Z is a descendant of X .(ii) Z blocks every path
between X and Y that contains
an arrow into X .
The heart of the criterion lies in cond. (ii), where the setZ is
required to block all the backdoor paths between X andY that
generate confounding bias. Furthermore, cond. (i)forbids the
inclusion of descendants of X in Z, which in-tends to avoid opening
new non-causal paths. For example,the empty set is admissible for
adjustment in Fig. 1(e), butadding S would not be allowed since it
is a descendant of Xand opens the non-causal path X ! S Y . On the
otherhand, even though S does not open any non-causal path inFig.
1(f), the criterion does not allow it to be used for
ad-justment.
X
Z1 Z2 Y
S
Figure 2: A graph that does not satisfy the s-backdoor
crite-rion (respect to Z), but the adjustment formula is
recoverableand corresponds to desired causal effect.
(Bareinboim, Tian, and Pearl 2014) noticed that adjust-ment
could be used for controlling for selection bias, in ad-dition to
confounding, which lead to a sufficient graphicalcondition called
Selection-Backdoor criterion.Definition 2 (Selection-Backdoor
Criterion (Bareinboimand Tian 2015) ). A set Z = Z+ [ Z�, with Z� ✓
De
X
and Z+ ✓ V \DeX
(where DeX
is the set of variables thatare descendants of X in G) satisfies
the selection backdoorcriterion (s-backdoor, for short) relative to
X,Y and M,Tin a directed acyclic graph G if:
(i) Z+ blocks all back door paths from X to Y(ii) X and Z+ block
all paths between Z� and Y , namely,
(Z� ?? Y | X,Z+)(iii) X and Z block all paths between S and Y ,
namely,
(Y ?? S | X,Z)(iv) Z [ {X,Y } ✓M and Z ✓ TThe first two
conditions echo the extended-backdoor
(Pearl and Paz 2010)3, while cond. (iii) and (iv) guaranteethat
the resultant expression is estimable from the availabledatasets.
If the S-Backdoor criterion holds for Z relative toX,Y and M,T in
G, then the effect P (y | do(x)) is identi-fiable, recoverable, and
given by
P (y | do(x)) =X
Z
P (y | x, z, S=1)P (z) (1)
We note here that the S-Backdoor is sufficient but not
nec-essary for adjustment. To witness, consider the model inFig. 2
where Z = {Z1, Z2},M = {X,Y, Z1, Z2}, andT = {Z1, Z2}. Here, Z+ =
;,Z� = {Z1, Z2}. Condition(ii) in Def. 2 is violated, namely (Z1,
Z2 ?6? Y | X). Per-haps surprisingly, the effect P (y | do(x)) is
identifiable andrecoverable, as follows:
P (y|do(x)) = P (y|x) (2)
= P (y|x)X
Z1
P (z1) (3)
=X
Z1
P (y|x, z1)P (z1) (4)
=X
Z1,Z2
P (y|x, z1, z2)P (z2|x, z1)P (z1) (5)
=X
Z1,Z2
P (y|x, z1, z2)P (z2|z1)P (z1) (6)
=X
Z1,Z2
P (y|x, z1, z2, S=1)P (z1, z2) (7)
3The extended-backdoor augments the backdoor criterion to al-low
for descendants of X that could be harmless in terms of bias.
-
(2) follows from the application of the second rule ofdo
calculus and the independence (X??Y )
GX . Equations(5),(6),(7) use the independences (Y??Z1|X),
(Z2??X|Z1)and (S??Y |X,Z1, Z2) respectively. The final expression
(7)is estimable from the available data.
Considering that Z = ; controls for confounding, adjust-ing for
Z = {Z1, Z2} seems unnecessary. As it turns out,covariates
irrelevant for confounding control, could play arole when we
compound this task with recovering from se-lection bias (where Y
will need to be separated from S).
Generalized Adjustment
without External Data
Let us consider the case when only biased data P (v | S=1)over V
is measured. Our interest in this section is on condi-tions that
allow P (y | do(x)) to be computed by adjustmentwithout external
measurements.
Consider the model G in Fig. 3(a). Note that Y and Sare
marginally independent in G
X
(the graph after an inter-vention on X where all edges into X
are not present). Asfor confounding, Z needs to be conditioned on,
but doingso opens a path between Y and S, letting spurious
correla-tion from the bias to be included in our calculation. It
turnsout that with a careful manipulation of the expression,
bothbiases can be controlled as follows:
P (y | do(x)) = P (y | do(x), S=1) (8)
=X
Z
P (y | do(x), z, S=1)P (z | do(x), S=1) (9)
=X
Z
P (y | x, z, S=1)P (z | S=1) (10)
Eq. (8) follows from the independence (Y ?? S | X) in
themutilated graph G
X
. Next we condition on Z and the (10) isvalid by the application
of the second rule of do-calculus tothe first term and the third
rule to the second in (9). Note thatevery term in (10) is estimable
from the biased distribution.
Next we introduce a complete criterion to determinewhether
adjusting by a set of covariates is admissible toidentify and
recover the causal effect. Before that, we re-quire the concept of
proper causal path.Definition 3 (Proper Causal Path (Shpitser,
VanderWeele,and Robins 2010)). Let X,Y be sets of nodes. A
causalpath from a node in X to a node in Y is called proper if
itdoes not intersect X except at the end point.Definition 4
(Generalized Adjustment Criterion Type 1). Aset Z satisfies the
generalized criterion relative to X,Y ina causal model with graph G
augmented with the selectionmechanism S if:(a) No element of Z is a
descendant in G
X
of any W /2 Xwhich lies on a proper causal path from X to Y.
(b) All non-causal paths between X and Y in G areblocked by Z
and S .
(c) Y is d-separated from S given X under the interventiondo(x),
i.e., (Y ?? S | X)
G
X
.(d) Every X 2 X is either a non-ancestor of S or it is
independent of Y in GX
, i.e., 8X2X\AnS (X ??Y)GX
XZ Y
S
(a)
Z
X YS
(b)
Figure 3: Models where Z satisfies Def. 8
GX(S) is the graph where all edges into X 2 X \ AnS are
removed, where AnV
stands for the set of ancestors of anode or set V in G.
Theorem 1 (Generalized Adjustment Formula Type 1).Given disjoint
sets of variables X,Y and Z in a causalmodel with graph G. The
effect P (y | do(x)) is given by
P (y | do(x)) =X
Z
P (y | x, z, S=1)P (z | S=1) (11)
in every model inducing G if and only if they satisfy the
gen-eralized adjustment criterion type 1 (Def. 8).
The proof of Theorem 1 is presented in the supplementalmaterial
due the the length constraints for the paper. Con-ditions (a) and
(b) echo the Extended Backdoor/AdjustmentCriterion (Pearl and Paz
2010; Shpitser, VanderWeele, andRobins 2010) and guarantee that Z
is admissible for adjust-ment in the unbiased distribution.
Condition (c) requires theoutcome Y to be independent of the
selection mechanism Swithout observing any covariate Z.
Intuitively, condition (d)ensures that the distribution of some
covariates under selec-tion bias are viable to control for
confounding.
The model in Fig. 3(b) also satisfies Def. 8, in this case
thederivation can be performed by first applying the backdoorand
then introducing S in both terms of the expression. Ingeneral, Def.
8 / Thm. 1 encapsulate all possible derivationsthat allow one to
recover from both selection and confound-ing biases.
Generalized Adjustment With External Data
A natural question that arises is whether additional
measure-ments in the population level over the covariates can
helpidentifying and recovering the desired causal effect. The
fol-lowing criterion tries to relax the previous results by
lever-aging the unbiased data available.Definition 5 (Generalized
Adjustment Criterion Type 2).Let T be the set of variables measured
without selection biasin the overall population. Also, let X,Y,Z be
disjoint setsof variables in a causal model with diagram G,
augmentedwith the selection mechanism S, where Z ✓ T. Then,
Zsatisfies the generalized criterion relative to X,Y if:(a) No
element in Z is a descendant in G
X
of any W /2 Xwhich lies on a proper causal path from X to Y.
(b) All non-causal X-Y paths in G are blocked by Z.(c) Y is
independent of the selection mechanism S given
Z and X, i.e. (Y ?? S | X,Z)Theorem 2 (Generalized Adjustment
Formula Type 2). LetT is the set of variables measured without
selection bias.
-
X
Z1 Z2
Z3
Y
S
(a)
X1
X2
Z1 Z2
Y
Z3S
(b)
Figure 4: Models where the set Z satisfies Def. 9.
Given disjoint sets of variables X,Y and Z ✓ T and acausal
diagram G, then, for every model inducing G, theeffect P (y |
do(x)) is given by
P (y | do(x)) =X
Z
P (y | x, z, S=1)P (z) (12)
if and only if the set Z satisfies the generalized
adjustmentcriterion type 2 relative to the pair X,Y.
Proof. Suppose the set Z satisfies the conditions relative
toX,Y. Then, by conditions (a) and (b), for every model in-duced by
G we have:
P (y | do(x)) =X
Z
P (y | x, z)P (z)
We note that S can be introduced to the first term by cond.(c),
which entail Eq. (34). The necessity part of the proof ismore
involved and is provided in the supplemental material(Correa and
Bareinboim 2016).
As in Def. 8, conditions (a) and (b) ensure Z is valid
foradjustment without selection bias. Condition (c) requiresthat
the influence of the selection mechanism in the outcomeis nullified
by conditioning on X and Z and condition (d)simply guarantees that
the adjustment expression can be es-timated from the available
data. Fig. 4 presents two causalmodels that satisfies the previous
criterion if measurementsover Z = {Z1, Z2, Z3} are available. To
witness how theexpression can be reached using do-calculus and
probabilityaxioms, consider Fig. 4(a):
P (y|do(x)) =X
Z3
P (y|do(x), z3)P (z3|do(x)) (13)
=X
Z3
P (y | x, z3)P (z3) (14)
=X
Z1,Z3
P (y | x, z1, z3)P (z1, z3) (15)
=X
Z
P (y | x, z)P (z2 | x, z1, z3)P (z1, z3) (16)
=X
Z
P (y | x, z, S=1)P (z) (17)
We start by conditioning on Z3 and removing do(x) usingrule 3 of
the do-calculus. Then summing over Z1 in thesecond term, pulling
the new sum out, and introducing Z1into the first term, using (Y ??
Z1 | Z3, X), results in (15).Eq. (16) follows from conditioning the
first term on Z2, andfinally by removing X in the second term using
the indepen-dence (Z2??X|Z1, Z3), combining the last two
distributions
over the Z’s and introducing the selection bias term usingthe
independence (Y ?? S | X,Z) results in (17), whichcorresponds to
(34).
Model in Fig. 4(b) also satisfies the type 2 criterion
andillustrates how this can be applied to models where X andY may
be sets of variables.
Finding Admissible Sets for
Generalized Adjustment
A natural extension to the problem is how to systematicallylist
admissible sets for adjustment, using the criteria dis-cussed in
the previous sections. This is specially importantin practice where
factors such as feasibility, cost, and statis-tical power relate to
the choosing of a covariate set.
In order to perform this kind of task efficiently, (van
derZander, Liskiewicz, and Textor 2014) introduced a
transfor-mation of the model called the Proper Backdoor Graph
andformulate a criterion equivalent to the Adjustment
Criterion:Definition 6 (Proper Backdoor graph). Let G = (V,E)be a
DAG, and X,Y ✓ V be pairwise disjoint subsets ofvariables. The
proper backdoor graph, denoted as Gpbd
XY
, isobtained from G by removing the first edge of every
propercausal path form X to Y.Definition 7 (Constructive Backdoor
Criterion (CBD)). LetG = (V,E) be a DAG, and X,Y ✓ V be pairwise
disjointsubsets of variables. The set Z satisfies the
ConstructiveBackdoor Criterion relative to (X,Y) in G if:
i) Z ✓ V \Dpcp(X,Y) andii) Z d-separates X and Y in the proper
backdoor graph
GpbdXY
.Where Dpcp(X,Y) = De((De
X
(X) \X) \AnX
(Y ))
The set Dpcp(X,Y) is exactly the set of nodes forbiddenby the
first condition in both our generalized criteria, andGpbd
XY
only contain X,Y paths that need to be blocked.Lemma 3
(Constructive Backdoor =) Generalized Ad-justment Type 2). Any set
Z satisfying the CDB applied toGpbd(X[S)Y and Dpcp(X[S,Y)[(V\T)
relative to X,Y inG also satisfies the Generalized Adjustment
Criterion type 2.
Proof. By the equivalence between the CBD criterion andthe
adjustment criterion, we have that Dpcp(X,Y) is ex-actly the set of
nodes forbidden by cond. (a) of the type 2criterion, so
Dpcp(X [ S,Y) (18)= De((De
X,S
(X [ {S}) \ (X [ S)) \ AnX,S
(Y ))
Since S has no descendants, DeX,S
(X[{S}) = DeX
(X)[S and An
X,S
(Y ) = AnX
(Y ). As a consequence Dpcp(X[S,Y) = Dpcp(X,Y) implying cond.
(a) of Def. 9.
Gpbd(X[S)Y has all non-causal paths from X to Y presentin
Gpbd
XY
, therefore, if Z block all non-causal paths in theformer, it
will do in the latter satisfying condition (b).
Every S – Y path may or may not contain X. If not, Zshould block
it in Gpbd(X[S)Y. In the latter case, the subpath
-
from X to Y is either causal or non-causal. If it is causalZ
will not block it, but the S-Y path will be blocked by X.If the
subpath is non-causal Z should block it, therefore, thelarger path
is blocked too. This argument implies condition(c). Since CBD holds
for Dpcp(X[ S,Y)[ (V \T) everyelement in Z must belong to T
satisfying condition (d).
Lemma 4 (Generalized Adjustment Type 2 =) Construc-tive
Backdoor). Any set Z satisfying the Generalized Ad-justment
Criterion type 2 relative to X,Y in G also satisfiesthe
constructive backdoor criterion applied to Gpbd(X[S)Y andDpcp(X [
S,Y) [ (V \T).
Proof. By lemma 3, Dpcp(X [ S,Y) = Dpcp(X,Y),which combined with
condition (d) implies condition (i) ofthe CBP.
By cond. (b) every non-causal path from X to Y isblocked by Z
and all paths from S to Y (which are alwaysnon-causal when S is
treated as an X) are blocked by Z,Xby cond. (c). Those two facts
together imply cond. (ii) of theCBD.
Theorem 5 (Generalized Adjustment Type 2 , Construc-tive
Backdoor). A set Z satisfies the Generalized AdjustmentCriterion
type 2 relative to X,Y in G if and only if it sat-isfies the CBC
applied to Gpbd(X[S)Y and Dpcp(X [ S,Y) [(V \T).
Proof. It follows immediately from lemmas 3,4.
Thm. 5 allows us to use the LISTSEP procedure (van derZander,
Liskiewicz, and Textor 2014) to list all the valid setsfor the
generalized adjustment type 2. The algorithm guar-antees O(n(n+m))
polynomial delay, where n is the num-ber of nodes and m is the
number of edges in G (see (Takata2010)). That means that the time
needed to output the firstsolution or indicate failure, and the
time between the outputof consecutive solutions, is O(n(n+m)).
To provide the reader an intuition of how the algorithmworks,
consider the graph in Fig. 5(a) and its associated con-structive
backdoor graph in (b). W is a “forbidden node” inthe sense that it
cannot be used for adjustment and for thisexample is the only
element in Dpcp(X, Y ) assuming thatunbiased measurement on the
covariates Z1, Z2 and Z3 areavailable (i.e. {Z1, Z2, Z3} ✓ T). The
algorithm LIST-SEP will output every set of variables that
d-separates X[Sfrom Y in the proper backdoor graph that does not
containany node in Dpcp(X, Y ).
Conclusions
We provide necessary and sufficient conditions for
identifi-cation and recoverability from selection bias of causal
ef-fects by adjustment, applicable for data-generating mod-els with
latent variables and arbitrary structure in non-parametric
settings. Def. 8 and Thm. 1 provide a completecharacterization of
identification and recoverability by ad-justment when no external
information is available. Def. 9and Thm. 3 provide a complete
graphical condition forwhen external information on a set of
covariates is avail-able. Thm. 5 allowed us to list all sets that
satisfies the last
X1 W
X2
Z1 Z2
Y
Z3S
(a)
X1 W
X2
Z1 Z2
Y
Z3S
(b)
Figure 5: (a) shows a causal model and (b) the proper back-door
graph associated with it relative to X [ S and Y . Thegray nodes in
(b) represents variables in Dpcp.
criterion in polynomial-delay time, effectively helping in
thedecision of what covariates need to be measured for
recover-ability. This is especially important when measuring a
vari-able is associated with a particular cost or effort. Despite
thefact that adjustment is neither complete nor the only methodto
identify causal effects, it is in fact the most used tool inthe
empirical sciences. The methods developed in this papershould help
to formalize and alleviate the problem of sam-pling selection and
confounding biases in a broad range ofdata-intensive
applications.
References
Angrist, J. D. 1997. Conditional independence in sampleselection
models. Economics Letters 54(2):103–112.Bareinboim, E., and Pearl,
J. 2012. Controlling selectionbias in causal inference. In
Lawrence, N., and Girolami, M.,eds., Proceedings of the 15th
International Conference onArtificial Intelligence and Statistics
(AISTATS). La Palma,Canary Islands: JMLR. 100–108.Bareinboim, E.,
and Pearl, J. 2016. Causal inference and thedata-fusion problem.
Proceedings of the National Academyof Sciences
113:7345–7352.Bareinboim, E., and Tian, J. 2015. Recovering causal
effectsfrom selection bias. Proceedings of the Twenty-Ninth
AAAIConference on Artificial Intelligence 3475–3481.Bareinboim, E.;
Tian, J.; and Pearl, J. 2014. Recover-ing from selection bias in
causal and statistical inference.In Brodley, C. E., and Stone, P.,
eds., Proceedings of theTwenty-eighth AAAI Conference on Artificial
Intelligence,2410–2416. Palo Alto, CA: AAAI Press.Cooper, G. 1995.
Causal discovery from data in the presenceof selection bias. In
Proceedings of the Fifth InternationalWorkshop on Artificial
Intelligence and Statistics, 140–150.Correa, J. D., and Bareinboim,
E. 2016. Causal effect iden-tification by adjustment under
confounding and selection bi-ases - supplemental material.
Technical report, Purdue AILab, Department of Computer Science,
Purdue University.Cortes, C.; Mohri, M.; Riley, M.; and
Rostamizadeh, A.2008. Sample selection bias correction theory. In
Interna-tional Conference on Algorithmic Learning Theory,
38–53.Springer.Heckman, J. J. 1979. Sample selection bias as a
specificationerror. Econometrica 47(1):153–161.
-
Kuroki, M., and Cai, Z. 2006. On recovering a popu-lation
covariance matrix in the presence of selection bias.Biometrika
93(3):601–611.Little, R. J. A., and Rubin, D. B. 1986. Statistical
Analysiswith Missing Data. New York, NY, USA: John Wiley &Sons,
Inc.Maathuis, M. H., and Colombo, D. 2015. A generalizedback-door
criterion. Ann. Statist. 43(3):1060–1088.Mefford, J., and Witte, J.
S. 2012. The covariate’s dilemma.PLoS Genet 8(11):e1003096.Pearl,
J., and Paz, A. 2010. Confounding equivalence incausal equivalence.
In Proceedings of the Twenty-Sixth Con-ference on Uncertainty in
Artificial Intelligence. Corvallis,OR: AUAI. 433–441.Pearl, J.
1993. Aspects of graphical models connected withcausality. In
Proceedings of the 49th Session of the Interna-tional Statistical
Institute, 391–401.Pearl, J. 1995. Causal diagrams for empirical
research.Biometrika 82(4):669–688.Pearl, J. 2000. Causality:
Models, Reasoning, and Infer-ence. New York: Cambridge University
Press. 2nd edition,2009.Pirinen, M.; Donnelly, P.; and Spencer, C.
C. 2012. In-cluding known covariates can reduce power to detect
geneticeffects in case-control studies. Nature genetics
44(8):848–851.Robins, J. M. 2001. Data, design, and background
knowl-edge in etiologic inference. Epidemiology
12(3):313–320.Robinson, L. D., and Jewell, N. P. 1991. Some
surprising re-sults about covariate adjustment in logistic
regression mod-els. International Statistical Review/Revue
Internationale deStatistique 227–240.Rubin, D. 1974. Estimating
causal effects of treatments inrandomized and nonrandomized
studies. Journal of Educa-tional Psychology 66:688–701.Shpitser,
I.; VanderWeele, T. J.; and Robins, J. M. 2010.On the validity of
covariate adjustment for estimating causaleffects. In Proceedings
of UAI 2010, 527–536.Takata, K. 2010. Space-optimal, backtracking
algorithmsto list the minimal vertex separators of a graph.
DiscreteApplied Mathematics 158(15):1660–1667.van der Zander, B.;
Liskiewicz, M.; and Textor, J. 2014.Constructing separators and
adjustment sets in ancestralgraphs. In Proceedings of UAI 2014,
907–916.Zadrozny, B. 2004. Learning and evaluating classifiers
un-der sample selection bias. In Proceedings of the
twenty-firstinternational conference on Machine learning, 114.
ACM.Zhang, J. 2008. On the completeness of orientation rules
forcausal discovery in the presence of latent confounders
andselection bias. Artif. Intell. 172:1873–1896.
Appendix
In order to prove the necessity of the criteria presented inthe
paper, it is imperative to construct structural causal mod-els that
serve as counter-examples to the identifiability orrecoverability
of the causal effect, whenever the set of co-variates Z fails to
satisfy the conditions relative to the pairX,Y. The following
lemmata will be useful to constructsuch models. The first one,
lemma 6 licenses the the directspecification of the conditional
distributions of any variablegiven its parents, in accordance to
the causal diagram G.Lemma 6 (Family Parametrization). Let G be a
causal di-agram over a set V of n variables. Consider also, a set
ofconditional distributions P (v
i
| paVi), 1 6 i 6 n such that
PaVi is the set of nodes in G from which there are outgoing
edges pointing into Vi
. Then, there exists a model M com-patible with G that induces P
(v) =
Q
n
i=1 P (vi | paVi).
Proof. (By construction) For every Vi
define any orderingon the values of its domain, and let v(j)
i
refer to the jthvalue in that order. Also, define a continuous
unobservablevariable U
i
⇠ U [0, 1] (uniformly distributed in the interval[0, 1]) for
every variable V
i
2 V. Then, construct a struc-tural causal model M = hU,V,F , P
(u)i where:
• V is the same set of observables in G• U =
S
n
i=1 U0i
• F =n
fi
(paVi, u
i
) = infj
n
P
j
k=1 P (v(j)i
| paVi) > u
i
o
,
1 6 i 6 n}• U
i
⇠ U [0, 1], 1 6 i 6 n
At every variable Vi
, given a particular configuration ofPa
Vi , M simulates its value using the distribution P (vi |pa
Vi). By the Markov property, the joint distribution will be
equal to the product of those distributions.
The following lemma, permits the construction of a struc-tural
causal model M compatible with a causal diagram G,using another
model compatible with a related, but different,causal diagram G0
where some arrows in a chain of variableshave the reverse
direction.Lemma 7 (Chain Reversal).
Proof. (By construction) Given M and any probability
dis-tribution P (v) induced by it, compute the joint distributionP
(r1, . . . , r`, t). Construct a new model M 0 with the sameset of
observable variables and identical functions for allvariables but
for R1, . . . , R`, T . For those, assign the func-tions f
Ri(ri�1, URi), 1 6 i 6 ` � 1 as in lemma 6. Also,let f
R`(UR`) = UR` , P (UR`) = P (r`). By lemma 6 thesub-models
composed of R1, . . . , R`, T in M 0 and M pro-duce the exact same
distribution and since the set of parentsand function for every
other part of the model are exactly thesame the overall
distribution is identical.
Finally, the following lemma allows to simplify
theparametrization of an arbitrarily long chain of binary
vari-ables.
-
Lemma 8 (Collapsible Path Parametrization). Consider acausal
diagram G and a probability distribution P (v) in-duced by any SCM
compatible with G. If G contains achain W0 ! W1 ! . . . ! Wk, where
each Wi repre-sents a binary random variable, for every 1 6 i 6 k
theonly incoming edge to W
i
is from Wi�1, and every condi-
tional distribution P (wi
| wi�1) = p, P (wi | wi�1) = q,
for some 0 < p, q < 1. Then, the conditional distributionP
(w
k
| w0) = q�(p�1)(p�q)k
q�p+1 , P (wk | w0) =q�q(p�q)k
q�p+1 .
Proof. Since W0, . . . ,Wk is a chain, the value of Wk is
afunction of W0 when all other W1, . . . ,Wk�1 are marginal-ized.
All W
i
, 1 6 i 6 k are independent of any othervariable given W0.
Therefore, the distribution P (wk | w0)is equal to
P
k�1i=1
Q
k
i=1 P (wi | wi�1), because any othervariable can be removed from
any product in this expres-sion and summed out. This distribution
can be calculatedas the product of 2x2 matrices corresponding to
the condi-tional distributions P (w
i
| wi�1) when encoded as WM =
p q1� p 1� q
�
. The product of k of such matrices is read-
ily available if WM
is decomposed using its eigenvalues{1, p� q} and
eigenvectors
nh
q
(1�p) ,�1i
, [1, 1]o
:
P (wk
| w0) =k�1X
i=1
k
Y
i=1
P (wi
| wi�1) = (WM )
k (19)
=
"
q�(p�1)(p�q)kq�p+1
q�q(p�q)kq�p+1
1� q�(p�1)(p�q)k
q�p+1 1�q�q(p�q)k
q�p+1
#
(20)
Definition 8 (Generalized Adjustment Criterion Type 1). Aset Z
satisfies the generalized criterion relative to X,Y ina causal
model with graph G augmented with the selectionmechanism S if:
(a) No element of Z is a descendant in GX
of any W /2 Xwhich lies on a proper causal path from X to Y.
(b) All non-causal paths between X and Y in G are blockedby Z
and S.
(c) Y is d-separated from S given X under the interventiondo(x),
i.e., (Y ?? S | X)
G
X
.(d) Every X 2 X is either a non-ancestor of S or it is
inde-
pendent of Y in GX
, i.e., 8X2X\AnS (X ??Y)GX
GX(S) is the graph where all edges into X 2 X \ AnS are
removed, where AnV
stands for the set of ancestors of anode or set V in G.Theorem 1
(Generalized Adjustment Formula Type 1).Given disjoint sets of
variables X,Y and Z in a causalmodel with graph G. The effect P (y
| do(x)) is given by
P (y | do(x)) =X
Z
P (y | x, z, S=1)P (z | S=1) (21)
in every model inducing G if and only if X,Y,Z satisfy
thegeneralized adjustment criterion type 1 (Def. 8).
Lemma 2. Let X,Y,Z be three disjoint sets of variables inthe
graph G. If Z satisfy the the conditions of the criteriontype 1
relative to the pair X,Y, Z can be partitioned intosets of
variables as follows:
• ZY,1nd
= {Z | Z 2 Z \DeX
and (Z ??Y | X, S)G
X
}• ZX,1
nd
= {Z | Z 2 Z \ DeX
\ ZY,1nd
and (Z ?? X |ZY,1
nd
, S)G
X(S)}
• ZYd
= {Z | Z 2 Z \ DeX
and (Z ?? Y |X,ZY,1
nd
,ZX,1nd
, S)G
X
}• ZX
d
= Z \DeX
\ ZYd
• ZY,2nd
= {Z | Z 2 Z \DeX
\ZY,1nd
\ZX,1nd
and (Z ??Y |X,ZY,1
nd
,ZX,1nd
,ZYd
,ZXd
, S)G
X
}• ZX,2
nd
= Z \DeX
\ ZY,1nd
\ ZX,1nd
Where (ZXd
?? X | ZYd
,ZY,1nd
,ZX,1nd
, S)G
X(ZYd
,S)and
(ZX,2nd
??X | Z \ ZX,2nd
, S)G
X(ZYd
,ZXd
,S)hold.
Proof. First, to show (ZX,2nd
??X | Z\ZX,2nd
, S)G
X(ZYd
,ZXd
,S),
we will assume, for the sake of contradiction that it is not
thecase. Then, there exists a covariate Z 0 2 Z \De
X
\ ZY,1nd
\ZX,1
nd
\ ZY,2nd
that is d-connected to some X 2 X by a pathq. For Z 0 to not be
in ZY,1
nd
, it must be d-connected to someY 2 Y by a path p that does not
contain any collider, except,possibly Z 0 itself. In particular it
may not contain S becauseof cond. (c). For Z 0 not to be in
ZY,2
nd
, p does not containany covariates in ZY,1
nd
,ZX,1nd
,ZYd
,ZXd
or ZY,2nd
because, asp does not have colliders, any of those would close
it. If Xis not an ancestor of S, q should have arrows going out
fromX . Since Z 0 is a non-descendant of X , q would contain
acollider, but for Z 0 not to be in ZX,1
nd
such collider must bean ancestor of S, contradicting the
assumption that X is notan ancestor of S. Hence, X must be an
ancestor of S.
If the path q have arrows into X , the junction of the pathsq
and p witnesses a violation to cond. (d) of the criterionunless Z 0
is a collider, in which case the same path is non-causal, proper
and open after all variables in Z are observed,contradicting cond.
(b). If q have arrows going out from X ,and given that Z 0 is not a
descendant of X , S must be acollider in q for Z 0 not to be in
ZX,1
nd
. But, for cond. (c)to hold, Z 0 must be a descendant of a
collider in p itself,because p has no other colliders. Then,
observing Z 0 willactivate the path constituted by the
concatenation of q and p,contradicting cond. (b). As a consequence,
such Z 0 cannotexists, proving the first part claim.
Second, consider (ZXd
?? X |ZY
d
,ZY,1nd
,ZX,1nd
, S)G
X(ZYd
,S)and note that the empty
set satisfies it. For the case when Z \ DeX
\ ZYd
6= ;,assume for the sake of contradiction that the
independencedoes not hold. Then, there exists Z 0 2 Z and some X 2
Xsuch that a directed path q goes from X to Z 0, and is openwhen X
and S are observed (note that q cannot containany non-descendant of
X). Since Z 0 /2 ZY
d
there exists
-
also a path p from Z 0 to some Y 2 Y that is open whenX,ZY,1
nd
,ZX,1nd
,ZYd
, S are observed, and does not containany node in X.
The path p must contain a collider (possibly Z 0); other-wise,
the junction of q and p is a proper causal path contain-ing Z 0
which contradicts cond. (a) of the criterion. If Z 0 isthe collider
itself cond. (b) will be violated unless a variablein ZY,2
nd
or ZX,2nd
closes q. But, the former cannot do it be-cause any variable in
q would not satisfy the definition ofZY,2
nd
, and similarly, no node in q can be in ZX,2nd
. There-fore, Z 0 cannot be the collider in p. This means, that
thereis a variable W 2 {S} [ ZY
d
[ ZY,1nd
[ ZX,1nd
is the colliderin p. However, S cannot be the one, because it
would vio-late cond. (c). Similarly, W does not satisfy the
definition ofZY
d
, because it is not independent of Y . And, W cannot bein
ZY,1
nd
or ZX,1nd
because it is a descendant of X . As a con-sequence, such active
path p is not possible, meaning thatZ 0 2 ZY
d
or does not exists.
Proof. (Of Theorem 1). (if) Suppose Z satisfy the the
con-ditions of the theorem relative to the pair X,Y. Then Z canbe
partitioned into several sets as in Lemma. 2. The causaleffect is
derived as follows:
First, by condition (c), (Y ?? S | X)G
X
and S can be intro-duced into the expression:
P (y | do(x)) = P (y | do(x), S=1) (22)
By definition, ZY,1nd
can be introduced in the factor alongwith a sum over that
variable:
P (y | do(x))
=X
Z
Y,1nd
P (y | do(x), zY,1nd
, S=1)P (zY,1nd
| S=1) (23)
Conditioning on ZX,1nd
, it becomes:
P (y | do(x))
=X
Z
Y,1nd
,Z
X,1nd
h
P (y | do(x), zY,1nd
, zx,1nd
, S=1) (24)
P (zx,1nd
| do(x), zY,1nd
, S=1)P (zY,1nd
| S=1)i
By definition of ZX,1nd
the independence(ZX,1
nd
??X | ZY,1nd
, S)G
X(S)holds (note that G
X(S) =
GX(ZY,1
nd
,S)since ZY,1
nd
is not a descendant of X), and rule
3 of the do-calculus allow the removal of do(x) from thesecond
term of the previous expression, which allow to jointhe second and
third factor together using the chain rule
P (y | do(x))
=X
Z
Y,1nd
,Z
X,1nd
h
P (y | do(x), zY,1nd
, zX,1nd
, S=1) (25)
P (zY,1nd
, zX,1nd
| S=1)i
The second factor can be summed over ZYd
, then pull thesum out and introduce ZY
d
in the first factor
P (y | do(x))
=X
Z
Y,1nd
,Z
X,1nd
,Z
Y
d
h
P (y | do(x), zY,1nd
, zX,1nd
, zYd
, S=1)
(26)
P (zY,1nd
, zX,1nd
, zYd
| S=1)i
Conditioning the first factor on ZXd
, yields:
P (y | do(x))
=X
Z
Y,1nd
,Z
X,1nd
,Z
Y
d
,Z
X
d
h
P (y | do(x), zY,1nd
, zX,1nd
, zYd
, zXd
, S=1)
P (zXd
| do(x), zY,1nd
, zX,1nd
, zYd
, S=1)
P (zY,1nd
, zX,1nd
, zYd
| S=1)i
(27)
By lemma 2, the independence (ZXd
?? X |ZY
d
,ZY,1nd
,ZX,1nd
, S)G
X(ZYd
,S)holds and using rule 3 of
do-calculus, the X operator can be removed from thesecond
factor, which allows to join the same and the thirdfactor
together:
P (y | do(x))
=X
Z
Y,1nd
,Z
X,1nd
,Z
Y
d
,Z
X
d
h
P (y | do(x), zY,1nd
, zX,1nd
, zYd
, zXd
, S=1)
P (zY,1nd
, zX,1nd
, zYd
, zXd
| S=1)i
(28)
Summing over ZY,2nd
in the second factor, pulling thesummation out, and introducing
the same term intothe first factor using the independence (ZY,2
nd
?? Y |X,ZY,1
nd
,ZX,1nd
,ZYd
,ZXd
, S)G
X
, yields:
P (y | do(x))
=X
Z
Y,1nd
,Z
X,1nd
,Z
Y
d
,Z
X
d
,Z
Y,2nd
P (y | do(x), zY,1nd
, zX,1nd
, zYd
, zXd
,ZY,2nd
, S=1)
P (zY,1nd
, zX,1nd
, zYd
, zXd
,ZY,2d
| S=1)�
(29)
Conditioning on ZX,2nd
:
P (y | do(x))
=X
Z
P (y | do(x), z, S=1)
P (zX,2nd
| do(x), zY,1nd
, zX,1nd
, zYd
, zXd
,ZY,2d
, S=1)
P (zY,1nd
, zX,1nd
, zYd
, zXd
,ZY,2d
| S=1)�
(30)
-
By lemma 2, the independence (ZX,2nd
?? X | Z \ZX,2
nd
, S)G
X(ZYd
,ZXd
,S)holds and using rule 3 of do-calculus,
the X operator can be removed from the second factor,which
allows to join the same and the third factor together:
P (y | do(x)) =X
Z
P (y | do(x), z, S=1)P (z | S=1)
(31)
Conditions (a) and (b) imply (Y ?? X | Z, S=1)G
X
andcan be used together with rule 2 of do-calculus to remove
ofthe do() operator from the first factor, which results in
theadjustment formula:
P (y | do(x)) =X
Z
P (y | x, z, S=1)P (z | S=1) (32)
(Only if) Note that condition (b) extends the second con-dition
from the adjustment criterion but also requiring allnon-causal
paths to be blocked even when S is observed.Suppose conditions (a)
and (b) do not hold, then for anymodel compatible with G
S
, which is also compatible withG, the adjustment formula is
equal to
P
Z
P (y | x, z)P (z).But by the adjustment criterion (Shpitser,
VanderWeele, andRobins 2010) this expression will not be equal to P
(y |do(x)).
To show the necessity for the extension of condition (b),and
conditions (c) and (d), we construct counter examplesfor the
identifiability and recoverability of the causal effect.In every
case, let V represent all variables in the graph ex-cept for the
selection mechanism S, and Q refer to the ad-justment formula as in
Eq. (32). We construct two SCMsM1 and M2, that induce probability
distributions P1 and P2,respectively. M1 and M2 will be compatible
with G, andagree in the probability distribution under selection
bias
P1(v | S=1) = P2(v | S=1) (33)
but Q1 in M1 will be different distribution than Q2 in M2.Let M1
be compatible with G and M2 with G
S
, making Sindependent from Pa
S
in M2 (i.e. (V ?? S)P2 ). Recov-erability should hold for any
parametrization, hence with-out loss of generality, all variables
are assumed to be binary.The construction parametrizes P1 through
its factors (as inlemma 6) and then parametrizes P2 to enforce
(33). As aconsequence, P2(v) = P2(v | S=1).
Without loss of generality, our attention can be directedinto
the particular Y 0 2 Y not satisfying the condition, andon the
causal effect for Y 0. To do this, the constructed modelwill have
every variable in Y \ {Y 0} disconnected from the
X
L
Z1
N
R
S
Q
P
Z2
O
Y 0
Figure 6: Non-causal path between X and Y activated whenS and Z
is observed. Edges with dotted lines encode arbi-trarily long
chains of nodes.
graph, more precisely (Y \ {Y 0}??V) holds, so that:
P (y | do(x)) =X
Z
P (y | x, z)P (z)
=Y
Y
X
Z
P (y | x, z)P (z)
=
0
@
Y
Y\Y 0P (y)
1
A
X
Z
P (y0 | x, z)P (z)
= �X
Z
P (y0 | x, z)P (z)
where � represents the product of the marginal distributionof
the remaining Y \ {Y 0}.
Suppose that condition (b) is not satisfied because there issome
non-causal path from X 2 X to Y 2 Y blocked by Zbut open when S is
observed. The model in Fig. 6 containsa non-causal path that has
arrows outgoing from X and Y .Models with arrow incoming to those
variables belong to thesame equivalence class as this one (lemma 7
can be use toreverse the directionality along chains of variables).
It alsocontain one collider before and one after S. Models withmore
colliders in the non-causal path can be constructed ina similar
way. Following the general construction describedbefore, the value
for the Q1 and Q2 are:
Q1 = �X
Z
P1(y0 | x, z)P1(z) =
�X
Z
P1(y0)P1(z) = �P1(y)
Q2 = �X
Z
P2(y0 | x, z)P2(z)
= �X
Z
P1(y0 | x, z, S=1)P1(z | S=1)
= �X
Z
P1(y0 | x, z, S=1)P1(z | S=1)
= �X
Z
PL,N,R,Q,P,O P1(y
0,x,z,l,n,r,q,p,o,S=1)P
Y 0,L,N,R,Q,P,O P1(y0,x,z,l,n,r,q,p,o,S=1)P1(z|S=1)
The numerator of the first term in the summation can be
fac-torized
P1(y0,x, z, l, n, r, q, p, o, S=1)
= P1(x)P1(l | x)P1(z⇤1 | l, n)P1(n)P1(r | n)P1(S=1 | r, q)P1(q |
p)P1(z⇤2 | p, o)P1(p)P (o | y0)P1(y0)
-
All the conditional distributions in the previous expressioncan
be parametrized using lemma 6 and lemma 8 with pa-rameters p = 3/5,
q = 2/5, as follows: P1(x) = P1(y0) =P1(n) = P1(p) = 1/2, P1(z⇤1 |
l, n) = P1(z⇤1 | l, n) =3/4, P1(z⇤1 | l, n) = P1(z⇤1 | l, n) = 1/4,
P1(z⇤2 |p, o) = P1(z⇤2 | p, o) = P1(z⇤2 | p, o) = 1/2, P1(z⇤2 |p,
o) = 3/4, P1(S=1 | r, q) = P1(S=1 | r, q) =1/2, P1(S=1 | r, q) =
1/4, P1(S=1 | r, q) = 3/4, P1(l |x) = 1/2 + ✏1/2, P1(l | x) = 1/2 �
✏1/2, P1(r | n) =1/2 + ✏2/2, P1(r | n) = 1/2 � ✏2/2, P1(q | p) =
1/2 +✏3/2, P1(q | p) = 1/2�✏3/2, P (o | y0) = 1/2+✏4/2, P (o |y0) =
1/2� ✏4/2, where ✏i = (1/5)ki , and k1 is the lengthof the path X
to L, k2 the length of the path from N to R,k3 the length of the
path from P to Q and k4 the length ofthe path from Y 0 to O.
Calculating both Q1 and Q2 with this parametrization
weobtain:
Q1 = �/2
Q2 = �/2
✓
14✏3✏4(✏1(7✏2 + ✏2✏3) + 56)(✏3 + 7)
� 14✏3✏4(✏1(7✏2 + ✏2✏3)� 56)(✏3 + 7)
�2✏3 � ✏3✏4 + ✏23✏4 � ✏23 + 63
(✏3 + 7)(✏3 � 9)
� 18✏3✏4(✏1(9✏2 � ✏2✏3)� 72)(✏3 � 9)
+18✏3✏4
(✏1(9✏2 � ✏2✏3) + 72)(✏3 � 9)
◆
Q1 and Q2 only when one of ✏i, i = 1, 2, 3, 4 is equal to
0,which is never possible in this parametrization since every✏i
= (1/5)ki and every ki
> 0.
Suppose condition (c) does not hold, then, there is an openpath
between Y and S in G
X
. The following are the casesfor which Y 0 may violate cond.
(c). Figure 7 illustrates thestructure of the cases
graphically.
case 1: Y 0 2 PaS
Let W be the set of nodes connecting X and Y 0 with di-rected
paths. Consider the induced subgraph G0 whereall nodes in V \ {X,W,
Y 0, S} are disconnected from{X,W, Y 0, S}. It must be the case
that Z and W are dis-joint, else condition (a) is violated.
Consequently, everyZ is disconnected, and (Z ?? Y 0)
P1 holds. M1 and M2are constructed from G0, the adjustment
formula in the
X Y 0
S
(a) Case 1
X Y 0
R
S
(b) Case 2
X Y 0
Z0
R
S
(c) Case 3
X Y 0
Q
N
R
S
(d) Case 4
X Y 0
Q
Z0
R
S
(e) Case 5
X Y 0
S
(f) Case 6
Figure 7: Cases considered for the necessity of condition(c) in
the proof for Thm. 1. Dotted directed arrows indicatechains of
arbitrary length in the graph.
-
second model can be expressed as:
Q2 = �X
Z
P2(y0 | x, z)P2(z)
= �X
Z
P1(y0 | x, z, S=1)P1(z | S=1)
= �X
Z
P1(y0 | x, S=1)P1(z | S=1)
= �P1(y0 | x, S=1)
= �P1(y0,x, S=1)
P
Y
0 P1(y0,x, S=1)
= �P1(S=1 | y0)P1(y0 | x)
P1(S=1 | y0)P1(y0 | x) + P1(S=1 | y0)P1(y0 | x)
Using lemma 6, let P1(S=1 | y0) = ↵ and P1(S=1 |y0) = � with 0
< ↵,� < 1 and ↵ 6= �. Proceed withlemma 8 (p = q = 1/2) to
define P (y0 | x) = 1/2. Theprevious expression becomes:
Q2 = �↵
↵+ �
Following a similar derivation it can be established thatQ1 =
�/2 which is never equal to Q2 in this parametriza-tion.
case 2: There is a directed path p from Y 0 to S without
anyZ.Let R be the parent of S in such path, let W
1
be theset of nodes between X and Y 0 as in the previous
case.Similarly let W
2
be the variables in the path from Y 0to R. Now, consider the
graph G0 where all nodes ex-cept for {X,W
1
, Y 0,W2
, R, S} are disconnected from{X,W
1
, Y 0,W2
, R, S}. Proceeding as in the previouscase, with the
consideration that Z is disconnected fromthe rest of the graph,
yields:
Q2 = �P1(y0 | x, S=1) = �P1(y
0,x, S=1)
P1(x, S=1)
The numerator can be rewritten as:
P1(y0,x, S=1) =
X
R
P1(y0,x, r, S=1)
=X
R
P1(x)P1(y0 | x)P1(r | y0)P1(S=1 | r)
Factorizing the denominator analogously, the term P1(x)is the
same and can be cancelled out, then Q2 becomes:
Q2 = �P1(y0 | x)
P
R
P1(r | y0)P1(S=1 | r)P
Y
0 P1(y0 | x)P
R
P1(r | y0)P1(S=1 | r)
Using lemma 8 to set P1(r | y0) = 1/2 + ✏/2, P1(r |y0) = 1/2 �
✏/2 where ✏ = (1/5)k (using p = 3/5, q =2/5), and defining P (S=1 |
r) = 2/3 and P (S=1 | r) =1/2 leads to Q2 = �(1/2 + ✏/14) and Q1 =
�/2 whichare never equal.
case 3: There is a directed path p from Y 0 to S that
containssome Z0 2 Z
Let R be the parent of S in such path. It can be assuredthat, X
and Y 0 are not connected by any causal path, oth-erwise Z0
violates condition (a). Let W1 be the nodesin the subpath between Y
0 and Z0, and W2 those in be-tween Z0 and R. Consider the graph G0
where all nodesexcept for {X, Y 0,W
1
, Z0,W2, R, S} are disconnectedfrom {X, Y 0,W
1
, Z0,W2, R, S}.Every Z0 = Z \ {Z0} is disconnected from the rest
of thegraph, then:
Q2 = �X
Z
P1(y0 | x, z, S=1)P1(z | S=1)
= �X
Z0
X
Z
0
P1(y0 | x, z0, z0, S=1)P1(z0, z0 | S=1)
= �X
Z0
P1(y0 | x, z0, S=1)P1(z0 | S=1)
= �X
Z0
P1(y0 | x, z0)P1(z0 | S=1)
= �X
Z0
P1(y0,x, z0)P
Y
0 P1(y0,x, z0)P1(z0 | S=1)
The numerator of the fraction in the last expression isequal
to:
P1(y0,x, z0) = P1(x)P1(y
0)P1(z0 | y0)A similar factorization can be employed for the
denomi-nator, as well as for Q1. The factor P1(x) appears in
bothparts of the fractions and can be canceled:
Q1 = �X
Z0
P1(y0)P1(z0 | y0)P
Y
0 P1(y0)P1(z0 | y0)P1(z0)
Q2 = �X
Z0
P1(y0)P1(z0 | y0)P
Y
0 P1(y0)P1(z0 | y0)P1(z0 | S=1)
Now, P1(z0) and P1(z0 | S=1) are derived in similarterms:
P1(z0) =X
Y
0
P1(z0, y0) =
X
Y
0
P1(y0)P1(z0 | y0)
P1(z0 | S=1) =P1(z0, S = 1)
P1(S = 1)
=P1(S = 1 | z0)
P
Y
0 P1(y0)P1(z0 | y0)P1(S = 1)
Replacing P1(z0) and P1(z0 | S=1) in Q1 and Q2,
thensimplifying:
Q1 = �X
Z0
P1(y0)P (z0 | y0) = �P1(y0)
Q2 = �
P
Z0P1(y0)P1(z0 | y0)P1(S=1 | z0)
P1(S = 1)= �
P1(y0, S=1)
P1(S = 1)
= �P1(y0)P1(S=1 | y0)
P
Y
0 P1(S=1 | y0)P1(y0)
The term P1(S=1 | y0) =P
R
P1(S=1 | r)P1(r | y0).Lemma 8 can be employed exactly as in the
previous case,
-
and P1(y0) can be defined directly since it has no parents,for
instance P1(y0) = 1/2, then the queries end up as:
Q1 = �12 Q2 = �
�
12 +
✏
14
�
Which are never equal.case 4: There is a path p connecting Y 0
and S that goes
through and ancestor of both, and does not contain anynode in
Z.Let N be the closest common ancestor of Y 0 and S. Let Rbe the
parent of S and Q the parent of Y 0 in the mentionedpath. Let W
1
be the set of nodes between X and Y 0. LetW
2
and W3
be the nodes in the paths from N to Q andfrom N to R
respectively. Consider the graph G0 wherethe arrows in the subpath
from N to Q are reversed and allnodes except for {X,W
1
, Y 0, Q,W2
, N,W3
, R, S} aredisconnected from {X,W
1
, Y 0, Q,W2
, N,W3
, R, S}.Any model constructed for G0 can be translated to a
modelcompatible with G using lemma 7. Following the samederivation
as in case 2 (taking into account that Z is dis-connected from the
rest of the graph) yields:
Q2 = �P1(y0,x, S=1)
P
Y
0 P1(y0,x, S=1)
The numerator of the last expression can be rewritten as:
P1(y0,x, S=1) =
X
Q
P1(y0,x, q, S=1)
=X
Q
P1(x)P1(y0 | x, q)P1(q)P1(S=1 | q)
By rewriting the denominator similarly, the term P1(x)appearing
in both vanishes, then Q2 becomes:
Q1 = �
P
Q
P1(y0 | x, q)P1(q)P
Y
0,Q
P1(y0 | x, q)P1(q)
Q2 = �
P
Q
P1(y0 | x, q)P1(q)P1(S=1 | q)P
Y
0,Q
P1(y0 | x, q)P1(q)P1(S=1 | q)
Lemma 8 can be employed to set P1(r | q) = 1/2 +✏/2, P1(r | q) =
1/2 � ✏/2 where ✏ = (1/5)k (us-ing p = 3/5, q = 2/5). Define P (S=1
| r) = 2/3and P (S=1 | r) = 1/2. Calculate P (S=1 | q) asP
R
P1(r | q)P1(S=1 | r). Also by lemma 8 let P1(y0 |q, x) = P1(y0 |
q, x) = 3/4, P1(y0 | q, x) = P1(y0 |q, x) = 1/2. It leads to:
Q1 = �38 Q2 = �
�
38 +
✏
56
�
which are never equal.case 5: There is a path p connecting Y 0
and S that goes
through an ancestor of both, and contains some Z0 2 Z.Let N,Q,R
be be defined as in the previous case, alsoconstruct G0 the same
way. Following the same deriva-tion strategy as in case 3, the
query expressions become:
Q1 = �X
Z0
P
Q
P1(y0 | x, q)P1(q)P1(z0 | q)P
Y
0,Q
P1(y0 | x, q)P1(q)P1(z0 | q)P1(z0)
Q2 = �X
Z0
P
Q
P1(y0 | x, q)P1(q)P1(z0 | q)P
Y
0,Q
P1(y0 | x, q)P1(q)P1(z0 | q)P1(z0 | S=1)
Now, P1(z0) and P1(z0 | S=1) are derived in similarterms:
P1(z0) =X
Q
P1(z0, q) =X
Q
P1(q)P1(z0 | q)
P1(z0 | S=1) =P
R
P1(S = 1, z0, r)P
Z0,RP1(S = 1, z0, r)
=P1(z0)
P
R
P1(r | z0)P1(S = 1 | r)P
Z0P1(z0)
P
R
P1(r | z0)P1(S = 1 | r)
Use lemma 8 to parametrize P1(r | z0) = 1/2 +✏1/2, P1(r | z0) =
1/2 � ✏1/2, P (z0 | q) = 1/2 +✏2/2, P (z0 | q) = 1/2 � ✏2/2 where
✏i = (1/5)ki , i ={1, 2} (using p = 3/5, q = 2/5 in both cases).
DefineP (S=1 | r) = 2/3 and P (S=1 | r) = 1/2. Also bylemma 8 let
P1(y0 | q, x) = P1(y0 | q, x) = 3/4, P1(y0 |q, x) = P1(y0 | q, x) =
1/2. The queries end up as:
Q1 = �38 Q2 = �
�
38 +
✏1✏256
�
Which are never equal.case 6: There is confounding path between
Y 0 and S con-
sisting of unobservable variables.The models for this case can
be constructed as in case 4,then moving the variables in the in the
path from Q to R(included) from the set of observables to the set
of unob-servables.
Now, suppose condition (d) does not hold. It should bethe case
that there exists some X 2 X\An
S
and a Y 0 2 Ythat are connected through a back-door path p that
would beusually blocked by some Z0 2 Z. There are two possiblecases
(depicted in Fig. 8) not contradicting previous condi-tions:
case 1: Z0 is an ancestor of X and Y 0, and S is a descendantof
X.Let W
1
be the nodes in the path between Z0 and X, W2those between Z0
and Y 0, and W3 those between X andS. As in previous cases,
consider the graph G0 where allnodes but {X, Z0, Y 0,W1,W2,W3} are
disconnectedfrom this set. Also, suppose X and Y 0 are not
connectedby any path not going through Z0. The queries in
thecorresponding models can be expressed as:
Q1 = �X
Z0
P1(y0 | x, z0)P1(z0)
= �X
Z0
P1(y0 | z0)P1(z0) = �P1(y0)
Q2 = �X
Z0
P1(y0 | x, z0, S=1)P1(z0 | S=1)
= �X
Z0
P1(y0 | z0)P1(z0 | S=1)
The term P1(z0 | S=1) is available as:
P1(z0 | S=1) =P1(z0)
P
X
P1(x | z0)P (S=1 | x)P
X,Z0P1(z0)P1(x | z0)P (S=1 | x)
-
X Y 0
Z0
S
(a) Case 1
X Y 0
Z0
S
(b) Case 2
Figure 8: Cases considered for the necessity of condition(d) in
the proof for Thm. 1. Dotted directed arrows indicatechains of
arbitrary length in the graph.
Let P1(z0) = 1/2, P1(y0 | z0) = 1/2 + ✏1/2, P (y0 |z0) == 1/2 �
✏1/2, P1(x | z0) = 1/2 + ✏2/2, P (x |z0) = 1/2 � ✏2/2. Also P1(S=1
| x) = 1/2 +✏3, P1(S=1 | x) = 1/2 � ✏3 where ✏i = (1/5)ki , i =1,
2, 3 (using lemma 8 with p = 3/5, q = 2/5 in allcases):
Q1 =12� Q2 = �
�
12 +
✏1✏2✏32
�
Which are never equal.case 2: Z0 is an ancestor of X and a
descendant of Y 0 and
S is a descendant of X.In this case there are no causal paths
between X and Y 0otherwise Z0 violates condition (a). Lemma 7 can
be usedto change the direction of the edges in the path from Y 0
toZ0 while staying in the same equivalence class, then thesame
parametrization from the previous case applies.
Definition 9 (Generalized Adjustment Criterion Type 2).Let T be
the set of variables measured without selection biasin the overall
population. Also, let X,Y,Z be disjoint setsof variables in a
causal model with diagram G, augmentedwith the selection mechanism
S, where Z ✓ T. Then, Zsatisfies the generalized criterion relative
to X,Y if:
(a) No element in Z is a descendant in GX
of any W /2 Xwhich lies on a proper causal path from X to Y.
(b) All non-causal X-Y paths in G are blocked by Z.(c) Y is
independent of the selection mechanism S given Z
and X, i.e. (Y ?? S | X,Z)Theorem 3 (Generalized Adjustment
Formula Type 2). LetT is the set of variables measured without
selection bias.Given disjoint sets of variables X,Y and Z ✓ T and
a
causal diagram G, then, for every model inducing G, theeffect P
(y | do(x)) is given by
P (y | do(x)) =X
Z
P (y | x, z, S=1)P (z) (34)
if and only if the set Z satisfies the generalized
adjustmentcriterion type 2 relative to the pair X,Y.
Proof. (if) Suppose the set Z satisfies the conditions
relativeto X,Y. Then, by conditions (a) and (b), for every
modelinduced by G we have:
P (y | do(x)) =X
Z
P (y | x, z)P (z)
We note that S can be introduced to the first term by cond.(c),
which entail Eq. (34).
(Only if) We prove this by the contrapositive: In case
con-ditions (a) or (b) do not hold the same argument as in theproof
of the previous type applies here.
To argue the necessity of condition (c), we show that
theadjustment formula is not recoverable. Let V represents
allvariables in the graph except for the selection mechanismS, we
construct two models with distributions P1 and P2,compatible with G
such that they agree in the probabilitydistribution under selection
bias
P1(v | S=1) = P2(v | S=1) (35)
and on the non-biased distribution over Z,
P1(z) = P2(z) (36)
but Q1 in the first model provides a different distributionthan
Q2 in the second model. Let P1 be compatible with Gand P2
compatible with G
S
such that (V ?? S)P2 . Recov-
erability should hold for any parametrization, hence with-out
loss of generality, we describe Markovian models andassume that
every variable is binary. In every constructionwe parametrize P1
through its factors and then parametrizeP2 to enforce (35) and
(36). Moreover, (35) also equals toP2(v).
Suppose condition (c) does not hold, then there is an openpath
between Y and S not blocked when Z is observed. Asfor the previous
type proof, we fix any Y 0 2 Y not satisfyingthe condition and
consider the query as:
Q = �X
Z
P (y0 | x, z)P (z)
where � represents the product of the marginal distributionof
the remaining Y.
Now, let us consider every possible scenario in which con-dition
(c) may be unsatisfied. Fig. 9 illustrates every case foreasier
reference.
case 1: Y 0 2 PaS
Proceed exactly as in case 1 of the proof for Thm. 1.case 2:
There is a directed path p from Y 0 to S
Consider the same name convention as in case 2 ofThm. 1. It
should be the case that ({R} [ H) \ Z =; for condition (c) to be
violated, therefore the sameparametrization of the mentioned case
applies here.
-
X U Y 0
S
(a) Case 1
X U Y 0
R
W
S
(b) Case 2
X U Y 0
W
R
Q
S
(c) Case 3
X Y 0
Z0
L
S
(d) Case 4
X U Y 0
LR
S
(e) Case 5
X U Y 0
R
Z0
L
S
(f) Case 6
Figure 9: Ways in which Y 0 and S can be connected, unidi-rected
dotted paths indicate that it may content an arbitrarynumber of
variables
case 3: The path goes through a common ancestor of Y 0 andS with
no colliders.This reduces to case 4 in the proof for Thm. 1.
case 4: Non directed path from Y 0 to S with some Z0 2 Zas a
collider.X must be disconnected from Y 0 otherwise condition (a)is
violated since Z0 is a descendant of Y 0 which is in everycausal
path from X to Y 0. Let L be the common ances-tor of S and Z0,
without loss of generality assume thatL is directly connected to
Z0, if this were not the casewe can apply lemma 8 with p = 1/2, q =
1/2 and usethe same parametrization below. In this model the
causaleffect P (y0 | do(x)) = P (y0), however the adjustmentformula
will not be equal to this effect in every modelcompatible with the
graph. To see this consider a modelwhere every Z but Z0 is also
disconnected in the graph,we have:
Q = �X
Z
P (y0 | x, z, S=1)P (z)
= �X
Z0
X
Z
0
P (y0 | x, z0, z0, S=1)P (z0, z0)
= �X
Z0
P (y0 | x, z0, S=1)P (z0)
= �X
Z0
P (y0,x, z0, S=1)P
Y
0 P (y0,x, z0, S=1)P (z0)
We can write the numerator of the fraction in the last
ex-pression as:
P (y0,x, z0, S=1)
=X
L
P (y0,x, z0, l, S=1)
= P (x)X
L
P (y0)P (l)P (z0 | y0, l)P (S=1 | l)
Let ↵L
(y0) = P (y0)P (l),f(y0, l, z0) = P (z0 | y0, l)P (S=1 | l)
andg(y0, l, z0) = P (z0 | y0, l). We have:
Q = �X
Z0
P (x)P
L
↵L
(y0)f(y0, l, z0)
P (x)P
L,Y
0 ↵L
(y0)f(y0, l, z0)P (z0)
= �X
Z0
P
L
↵L
(y0)f(y0, l, z0)P
L,Y
0 ↵L
(y0)f(y0, l, z0)P (z0)
Consider the parametrization:P1(y0) = P1(l) = 1/2,P1(z0 | y0, l)
= 1/2+ ✏, P1(z0 | y0, l) = 1/2� ✏, P1(z0 |y0, l) = 1/2, P1(z0 | y0,
l) = 1/2, for 0 < ✏ < 1/2.Let P1(S=1 | l) = ↵, P1(S=1 | l̄) =
�, and pick any0 < ↵,� < 1. We can calculate P (z0) as:
P (z0) =X
Y
0,L
P (z0 | y0, l)P (y0)P (l) = 1/2
Computing the queries with the given parametrization we
-
obtain:
P (y | do(x)) = �2
Q =�
2
(↵+ �)2 � 2✏2(↵� �)2
(↵+ �)2 � ✏2(↵� �)2
Q is always different that P (y | do(x)) for ↵ 6= �.case 5:
There is a path between Y 0 and S passing by an an-
cestor of Y 0 having some X 0 2 X as a collider.Such path would
have arrows incoming to X 0. But it isalso an open non-causal path
between Y 0 and X 0, violat-ing condition (b).
case 6: There is a path p between Y 0 and S passing by
anancestor R of Y 0 having some Z0 2 Z as a collider.Let us
consider the variables X, Y 0, R, Z0, L, such that Lis a common
ancestor of Z0 and S, that together with Rhas converging arrows
into Z0. The path p can be seen asthe concatenation of four
segments p1, . . . , p4 such that p1is the segment R, . . . , Y 0,
p2 is the segment R, . . . , Z0, p3is the segment L, . . . , Z0,
and p4 the segment L, . . . , S.Note that by construction, there
might exist only chainsalong each of these segments, without loss
of generalitywe assume that those are segments of length one, but
itis trivial to stretch those segments using lemma 8 (withp=q=1/2)
as in previous cases. When we have multipleZ0’s in p, we will have
the concatenation of several seg-ments p2 and p3, and it will also
be simple to extend theconstruction. We use a parametrization
similar to the pre-vious cases, here Z0 = Z \ Z0:
Q2 = �X
Z
P1(y0 | x, z, S=1)P1(z)
= �X
Z0
X
Z
0
P1(y0 | x, z0, z0, S=1)P1(z0, z0)
= �X
Z0
P1(y0 | x, z0, S=1)
X
Z
0
P1(z0, z0)
= �X
Z0
P1(y0 | x, z0, S=1)P1(z0)
= �X
Z0
P1(y0,x, z0, S=1)P
Y
0 P1(y0,x, z0, S=1)P1(z0)
We can write the numerator of the fraction in the last
ex-pression as:
P1(y0,x, z0, S=1)
=X
R,L
P1(y0,x, z0, r, l, S=1)
=X
R,L
P1(x)P1(y0 | r)P1(r)P1(z0 | r, l)P1(S=1 | l)P1(l)
= P1(x)X
R
P1(y0 | r)P1(r)
X
L
P1(z0 | r, l)P1(S=1 | l)P1(l)
Let ↵R
(y0) = P1(y0 | r)P1(r),f(r, z0) =
P
L
P1(z0 | r, l)P1(S=1 | l)P1(l) and
g(r, z0) =P
L
P1(z0 | r, l)P1(l). We have:
Q2 = �X
Z0
P1(x)P
R
↵R
(y0)f(r, z0)
P1(x)P
R,Y
0 ↵R
(y0)f(r, z0)P1(z0)
= �X
Z0
P
R
↵R
(y0)f(r, z0)P
R,Y
0 ↵R
(y0)f(r, z0)P1(z0)
Consider the following parametrization:P1(r) = P1(l) = 1/2,
P1(y0 | r) = 1/2 + ✏, P1(y0 |r) = 1/2 � ✏, P1(z0 | r, l) = 1/2 + ✏,
P1(z0 | r, l) =1/2 � ✏, P1(z0 | r, l) = 1/2, P1(z0 | r, l) = 1/2,
for0 < ✏ < 1/2.Let P1(S=1 | l) = ↵, P1(S=1 | l̄) = �, and
pick any0 < ↵,� < 1. We can calculate P1(z0) as:
P1(z0) =X
R,L
P1(z0 | r, l)P1(r)P1(l) = 1/2
Computing the queries with the given parametrization
weobtain:
Q1 =�
2
Q2 =�
2
✓
1� 2✏3(↵� �)2
(↵+ �)2 � ✏2(↵� �)2
◆
Q2 is always different than Q1 for ↵ 6= � and 0 < ✏
<1/2.