Top Banner
HALF-TREK CRITERION FOR GENERIC IDENTIFIABILITY OF LINEAR STRUCTURAL EQUATION MODELS RINA FOYGEL, JAN DRAISMA, AND MATHIAS DRTON Abstract. A linear structural equation model relates random variables of interest and corresponding Gaussian noise terms via a linear equation system. Each such model can be represented by a mixed graph in which directed edges encode the linear equations, and bidirected edges indicate possible correlations among noise terms. We study parameter identifiability in these models, that is, we ask for conditions that ensure that the edge coefficients and correlations appearing in a linear structural equation model can be uniquely recovered from the covariance matrix of the associated normal distribution. We treat the case of generic identifiability, where unique recovery is possible for almost every choice of parameters. We give a new graphical criterion that is sufficient for generic identifiability. It improves criteria from prior work and does not require the directed part of the graph to be acyclic. We also develop a related necessary condition and examine the “gap” between sufficient and necessary conditions through simulations as well as exhaustive algebraic computations for graphs with up to five nodes. 1. Introduction When modeling the joint distribution of a random vector X =(X 1 ,...,X m ) T , it is often natural to appeal to noisy functional relationships. In other words, each variable X w is assumed to be a function of the remaining variables and a stochastic noise term w . The resulting models are known as linear structural equation models when the relationship is linear, that is, when (1.1) X w = λ 0w + X v6=w λ vw X v + w , w =1,...,m, or, in vectorized form with a matrix Λ = (λ vw ) that is tacitly assumed to have zeros along the diagonal, (1.2) X = λ 0 T X + . The classical distributional assumption is that the error vector =( 1 ,..., m ) T has a multivariate normal distribution with zero mean and some covariance matrix Ω=(ω vw ). Writing I for the identity matrix, it follows that X has a multivariate normal distribution with mean vector (I - Λ) -T λ 0 and covariance matrix (1.3) Σ=(I - Λ) -T Ω(I - Λ) -1 . Background on structural equation modeling can be found, for instance, in [Bol89]. As emphasized in [SGS00, Pea00], their great popularity in applied sciences is due to the natural causal interpretation of the involved functional relationships. Key words and phrases. Covariance matrix, Gaussian distribution, graphical model, multivari- ate normal distribution, parameter identification, structural equation model. 1 arXiv:1107.5552v1 [math.ST] 27 Jul 2011
32

arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

May 07, 2023

Download

Documents

Khang Minh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

HALF-TREK CRITERION FOR GENERIC IDENTIFIABILITY OF

LINEAR STRUCTURAL EQUATION MODELS

RINA FOYGEL, JAN DRAISMA, AND MATHIAS DRTON

Abstract. A linear structural equation model relates random variables of

interest and corresponding Gaussian noise terms via a linear equation system.Each such model can be represented by a mixed graph in which directed edges

encode the linear equations, and bidirected edges indicate possible correlations

among noise terms. We study parameter identifiability in these models, thatis, we ask for conditions that ensure that the edge coefficients and correlations

appearing in a linear structural equation model can be uniquely recovered

from the covariance matrix of the associated normal distribution. We treatthe case of generic identifiability, where unique recovery is possible for almost

every choice of parameters. We give a new graphical criterion that is sufficient

for generic identifiability. It improves criteria from prior work and does notrequire the directed part of the graph to be acyclic. We also develop a related

necessary condition and examine the “gap” between sufficient and necessaryconditions through simulations as well as exhaustive algebraic computations

for graphs with up to five nodes.

1. Introduction

When modeling the joint distribution of a random vector X = (X1, . . . , Xm)T ,it is often natural to appeal to noisy functional relationships. In other words, eachvariable Xw is assumed to be a function of the remaining variables and a stochasticnoise term εw. The resulting models are known as linear structural equation modelswhen the relationship is linear, that is, when

(1.1) Xw = λ0w +∑v 6=w

λvwXv + εw, w = 1, . . . ,m,

or, in vectorized form with a matrix Λ = (λvw) that is tacitly assumed to havezeros along the diagonal,

(1.2) X = λ0 + ΛTX + ε .

The classical distributional assumption is that the error vector ε = (ε1, . . . , εm)T

has a multivariate normal distribution with zero mean and some covariance matrixΩ = (ωvw). Writing I for the identity matrix, it follows that X has a multivariatenormal distribution with mean vector (I − Λ)−Tλ0 and covariance matrix

(1.3) Σ = (I − Λ)−TΩ(I − Λ)−1.

Background on structural equation modeling can be found, for instance, in [Bol89].As emphasized in [SGS00, Pea00], their great popularity in applied sciences is dueto the natural causal interpretation of the involved functional relationships.

Key words and phrases. Covariance matrix, Gaussian distribution, graphical model, multivari-ate normal distribution, parameter identification, structural equation model.

1

arX

iv:1

107.

5552

v1 [

mat

h.ST

] 2

7 Ju

l 201

1

Page 2: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

2 RINA FOYGEL, JAN DRAISMA, AND MATHIAS DRTON

1 2 3

Figure 1. Mixed graph for the instrumental variable model.

Interesting models are obtained by imposing some pattern of zeros among thecoefficients λvw and the covariances ωvw. It is convenient to think of the zeropatterns as being associated with a mixed graph that contains directed edges v → wto indicate possibly non-zero coefficients λvw and bidirected edges v ↔ w when ωvwis a possibly non-zero covariance; in figures we draw the bidirected edges dashedfor better distinction. Mixed graph representations have first been advocated in[Wri21, Wri34] and are also known as path diagrams. We briefly illustrate this inthe next example, which gives the simplest version of what are often referred to asinstrumental variable models; see also [DMS10].

Example 1 (IV). Suppose that, as in [ER99], we record an infant’s birth weight(X3), the level of maternal smoking during pregnancy (X2), and the cigarette taxrate that applies (X1). A model of interest, with mixed graph in Figure 1, assumes

X1 = λ01 + ε1, X2 = λ02 + λ12X1 + ε2, X3 = λ03 + λ23X2 + ε3,

with an error vector ε that has zero mean vector and covariance matrix

Ω =

ω11 0 00 ω22 ω23

0 ω23 ω33

.

The possibly non-zero entry ω23 can absorb the effects that unobserved confounders(such as age, income, genetics, etc.) may have on both X2 and X3; compare [RS02,Wer11] for background on mixed graph representations of latent variable problems.

Formally, a mixed graph is a triple G = (V,D,B) where V is a finite set of nodesand D,B ⊆ V × V are two sets of edges. In our context, the nodes correspond tothe random variables X1, . . . , Xm, and we simply let V = [m] := 1, . . . ,m. Thepairs (v, w) in the set D represent directed edges and we will always write v → w;v → w ∈ D does not imply w → v ∈ D. The pairs in B are bidirected edges v ↔ w;they have no orientiation, that is, v ↔ w ∈ B if and only if w ↔ v ∈ B. Neitherthe bidirected part (V,B) nor the directed part (V,D) contain self-loops, that is,v → v 6∈ D and v ↔ v 6∈ B for all v ∈ V . If the directed part (V,D) does notcontain directed cycles (that is, no cycle v → · · · → v can be formed from the edgesin D), then the mixed graph G is said to be acyclic.

Let RD be the set of real m ×m-matrices Λ = (λvw) with support D, that is,λvw = 0 if v → w 6∈ D. Write RDreg for the subset of matrices Λ ∈ RD for which I−Λ

is invertible, where I denotes the identity matrix. (If G is acyclic, then RD = RDreg;see the remark after equation (2.3).) Similarly, let PDm be the cone of positivedefinite symmetric m×m-matrices Ω = (ωvw) and define PD(B) ⊂ PDm to be thesubcone of matrices with support B, that is, ωvw = 0 if v 6= w and v ↔ w 6∈ B.

Definition 1. The linear structural equation model given by a mixed graph G =(V,D,B) on V = [m] is the family of all m-variate normal distributions with co-variance matrix

Σ = (I − Λ)−TΩ(I − Λ)−1

for Λ ∈ RDreg and Ω ∈ PD(B).

Page 3: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

IDENTIFIABILITY OF LINEAR STRUCTURAL EQUATION MODELS 3

The first question that arises when specifying a linear structural equation modelis whether the model is identifiable in the sense that the parameter matrices Λ ∈RDreg and Ω ∈ PD(B) can be uniquely recovered from the normal distribution theydefine. Clearly, this is equivalent to asking whether they can be recovered from thedistribution’s covariance matrix, and thus we ask whether the fiber

(1.4) F(Λ,Ω) = (Λ′,Ω′) ∈ Θ : φG(Λ′,Ω′) = φG(Λ,Ω)is equal to (Λ,Ω). Here, we introduced the shorthand Θ := RDreg × PD(B). Putdifferently, identifiability holds if the parametrization map

(1.5) φG : (Λ,Ω) 7→ (I − Λ)−TΩ(I − Λ)−1

is injective on Θ, or a suitably large subset.

Example 2 (IV, continued). In the instrumental variable model associated withthe graph in Figure 1,

Σ = (σvw) =

1 −λ12 00 1 −λ23

0 0 1

−T ω11 0 00 ω22 ω23

0 ω23 ω33

1 −λ12 00 1 −λ23

0 0 1

−1

=

ω11 ω11λ12 ω11λ12λ23

ω11λ12 ω22 + ω11λ212 ω23 + λ23σ22

ω11λ12λ23 ω23 + λ23σ22 ω33 + 2ω23λ23 + λ223σ22

.

Despite the presence of both the edges 2 → 3 and 2 ↔ 3, we can recover Λ (andthus also Ω) from Σ using that

λ12 =σ12

σ11, λ23 =

σ13

σ12.

The first denominator σ11 is always positive since Σ is positive definite. The seconddenominator σ12 is zero if and only if λ12 = 0. In other words, if the cigarette tax(X1) has no effect on maternal smoking during pregnancy (X2), then there is noway to distinguish between the causal effect of smoking on birth weight (coefficientλ23) and the effects of confounding variables (coefficient ω23). Indeed the map φGis injective only on the subset of Θ with λ12 6= 0.

In this paper we study the kind of identifiability encountered in the instru-mental variables example. The statistical literature often refers to this as almost-everywhere identifiability to express that the exceptional pairs (Λ,Ω) with fibercardinality |F(Λ,Ω)| > 1 form a set of measure zero. However, since the map φG isrational, the exceptional sets are well-behaved null sets, namely, they are algebraicsubsets. An algebraic subset V ⊂ Θ is a subset that can be defined by polynomialequations, and it is a proper subset of the open set Θ unless it is defined by the zeropolynomial. A proper algebraic subset has smaller dimension than Θ (see [CLO07]),and thus also measure zero; statistical work often quotes the lemma in [Oka73] forthe latter fact. These observations motivate the following definition and problem.

Definition 2. The mixed graph G is said to be generically identifiable if φG isinjective on the complement Θ \V of a proper (i.e., strict) algebraic subset V ⊂ Θ.

Problem 1. Characterize the mixed graphs G that are generically identifiable.

Despite the long history of linear structural equation models, the problem juststated remains open, even when restricting to acyclic mixed graphs. However, in

Page 4: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

4 RINA FOYGEL, JAN DRAISMA, AND MATHIAS DRTON

the last two decades a number of graphical conditions have been developed thatare sufficient for generic identifiability. We refer the reader in particular to [Pea00],[BP02b], [BP06], [Tia09], and [CK10], which each contain many further references.To our knowledge, the condition that is of most general nature and most in thespirit of attempting to solve Problem 1 is the G-criterion of [BP06]. This criterion,and in fact all other mentioned work, uses linear algebraic techniques to solvethe parametrized equation systems that define the fibers F(Λ,Ω). Therefore, theG-criterion is in fact sufficient for the following stronger notion of identifiability,which we have seen to hold for the graph from Figure 1; recall the formulas givenin Example 2.

Definition 3. The mixed graph G is said to be rationally identifiable if there exists aproper algebraic subset V ⊂ Θ and a rational map ψ such that ψφG(Λ,Ω) = (Λ,Ω)for all (Λ,Ω) ∈ Θ \ V .

The main results of our paper give a graphical condition that is sufficient forrational identifiability and that is strictly stronger than the G-criterion of [BP06]when applied to acyclic mixed graphs. However, the new condition, which we namethe half-trek criterion, is also applicable to cyclic graphs, for which little prior workexists. The approach we take also yields a necessary condition, or more preciselyput, a graphical condition that is sufficient for G (or rather the map φG) to begenerically infinite-to-one. That is, the condition implies that the fiber F(Λ,Ω) isinfinite for all pairs (Λ,Ω) outside a proper algebraic subset of Θ. If |F(Λ,Ω)| ≡ houtside a proper algebraic subset, then we say that G is generically h-to-one.

Our main results just described are stated in detail in Section 3 and proven inSection 8 and 9. The comparison to the G-criterion is made in Section 4, with someproofs deferred to Section 10. Some interesting examples are visited in Section 5.Those include examples that do not seem to be covered by any known graphicalcriterion. These examples were found as part of an exhaustive study of the identi-fiability properties of all mixed graphs with up to 5 nodes. The study is based ontechniques from computational algebraic geometry [CLO07]. The results togetherwith simulations for graphs with 6 and 7 nodes are given in Section 6. In Section 7,we describe how our new half-trek behaves with respect to a graph decompositiontechnique for acyclic mixed graphs that is due to [Tia05]. Concluding remarks aregiven in Section 12.

2. Preliminaries on treks

A path from node v to node w in a mixed graph G = (V,D,B) is a sequence ofedges, each from either D or B, that connect the consecutive nodes in a sequenceof nodes beginning at v and ending in w. We do not require paths to be simple oreven to obey directions, that is, a path may include a particular edge more thanonce, the nodes that are part of the edges need not all be distinct, and directededges may be traversed in the wrong direction. A path π from v to w is a directedpath if all its edges are directed and pointing to w, that is, π is of the form

v = v0 → v1 → · · · → vr = w.

In a covariance matrix in a structural equation model, that is, a matrix structuredas in Definition 1, the entry σvw is a sum of terms that correspond to certain paths

Page 5: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

IDENTIFIABILITY OF LINEAR STRUCTURAL EQUATION MODELS 5

from v to w. For instance, in Example 2, the variance

(2.1) σ33 = ω33 + ω23λ23 + ω23λ23 + λ223ω22 + λ2

23λ212ω11

is a sum of five terms that are associated with the trivial path 3, which has noedges, and the four additional paths

3↔ 2→ 3, 3← 2↔ 3, 3← 2→ 3, 3← 2← 1→ 2→ 3.

In the literature, the paths that contribute to a covariance are known as treks; com-pare, e.g., [STD10] and the references therein. A trek from ‘source’ v to ‘target’ w isa path from v to w whose consecutive edges do not have any colliding arrowheads.In other words, a trek from v to w is a path of one of the two following forms:

vLl ← vL

l−1 ← · · · ← vL1 ← vL

0 ←→ vR0 → vR

1 → . . .→ vRr−1 → vR

r

or vLl ← vL

l−1 ← · · · ← vL1 ←−−−− vT −−−−→ vR

1 → . . .→ vRr−1 → vR

r ,

where the endpoints are vLl = v, vR

r = w. In the first case, we say that the left-handside of π, written Left (π), is the set of nodes vL

0 , vL1 , . . . , v

Ll , and the right-hand

side, written Right (π), is the set of nodes vR0 , v

R1 , . . . , v

Rr . In the second case,

Left (π) = vT, vL1 , . . . , v

Ll , and Right (π) = vT, vR

1 , . . . , vRr — note that the ‘top’

node vT is part of both sides of the trek. As pointed out before, paths and inparticular treks are not required to be simple. A trek π may thus pass through anode on both its left- and right-hand sides. If the graph contains a cycle, then theleft- or right-hand side of π may contain this cycle. A trek from v to v may haveno edges, in which case v is the top node, and Left (π) = Right (π) = v, and wecall the trek trivial.

A trek is obtained by concatenating two directed paths at a common top nodeor by joining them with a bidirected edge, and the connection between the matrixentries and treks is due to the fact that

((I − Λ)−1)vw =∑

π∈P(v,w)

∏x→y∈π

λxy,(2.2)

where P(v, w) is the set of directed paths from v to w in G. The equality in (2.2)follows by writing (I − Λ)−1 = I + Λ + Λ2 + . . . . For a precise statement aboutthe form of the covariance matrix Σ, let T (v, w) be the set of all treks from v tow. For a trek π that contains no bidirected edge and has top node v, define a trekmonomial as

π(λ, ω) = ωvv∏

x→y∈πλxy.

For a trek π that contains a bidirected edge v ↔ w, define the trek monomial as

π(λ, ω) = ωvw∏

x→y∈πλxy.

Then following rule [SGS00, Wri21, Wri34] expresses the covariance matrix Σ as asummation over treks; compare the example in (2.1).

Trek Rule. The covariance matrix Σ for a mixed graph G is given by

(2.3) σvw =∑

π∈T (v,w)

π(λ, ω);

Page 6: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

6 RINA FOYGEL, JAN DRAISMA, AND MATHIAS DRTON

1

2 4

5

3

Figure 2. An acyclic mixed graph.

We remark that if G is acyclic then Λk = 0 for all k ≥ m, and so the expressionin (2.2) is polynomial. Similarly, (2.3) writes σvw as a polynomial. If G is cyclic,then one obtains power series that converge if the entries of Λ are small enough.However, in the proofs of Section 8 it will also be useful to treat these as formalpower series.

Our identifiability results involve conditions that refer to paths that we termhalf-treks. A half-trek π is a trek with |Left (π)| = 1, meaning that π is of the form

vL0 ↔ vR

0 → vR1 → . . .→ vR

r−1 → vRr

or vT → vR1 → . . .→ vR

r−1 → vRr .

Example 3. In the graph shown in Figure 2,

(a) neither π1 : 2 → 3 → 4 ← 3 nor π2 : 3 → 4 ↔ 1 are treks, due to thecolliding arrowheads at node 4.

(b) π : 2 ← 1 ↔ 4 → 5 is a trek, but not a half-trek. Left (π) = 1, 2 andRight (π) = 4, 5.

(c) π : 1→ 2→ 3 is a half-trek with Left (π) = 1 and Right (π) = 1, 2, 3.

It will also be important to consider sets of treks. For a set of n treks, Π =π1, . . . , πn, let xi and yi be the source and the target of πi, respectively. If thesources are all distinct, and the targets are all distinct, then we say that Π is asystem of treks from X = x1, . . . , xn to Y = y1, . . . , yn, which we write asΠ : X ⇒ Y . Note that there may be overlap between the sources in X and thetargets in Y , that is, we might have X ∩ Y 6= ∅. The system Π is a system ofhalf-treks if every trek πi is a half-trek. Finally, a set of treks Π = π1, . . . , πn hasno sided intersection if

Left (πi) ∩ Left (πj) = ∅ = Right (πi) ∩ Right (πj) ∀i 6= j .

Example 4. Consider again the graph from Figure 2.

(a) The pair of treks

π1 : 3→ 4→ 5, π2 : 4↔ 1

forms a system of treks Π = π1, π2 between X = 3, 4 and Y = 1, 5.The node 4 appears in both treks, but is in only the right-hand side of π1

and only the left-hand side of π2. Therefore, Π has no sided intersection.(b) The set Π = π1, π2 comprising the two treks

π1 : 1↔ 4, π2 : 3→ 4→ 5.

is a system of treks between X = 1, 3 and Y = 4, 5. Since node 4 is inRight (π1) ∩ Right (π2), the system Π has a sided intersection.

Page 7: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

IDENTIFIABILITY OF LINEAR STRUCTURAL EQUATION MODELS 7

3. Main identifiability and non-identifiability results

Define the set of parents of a node v ∈ V as P (v) = w : w → v ∈ D andthe set of siblings as S(v) = w : w ↔ v ∈ B. Let H(v) be the set of nodes inV \(v∪S(v)) that can be reached from v via a half-trek. These half-treks containat least one directed edge. Put differently, a node w 6= v that is not a sibling of v isin H(v) if w is a proper descendant of v or one of its siblings. The term ‘descendant’is commonly used to refer to a node that can be reached by a directed path.

Definition 4. A set of nodes Y ⊂ V satisfies the half-trek criterion with respect tonode v ∈ V if

(i) |Y | = |P (v)|,(ii) Y ∩ (v ∪ S(v)) = ∅, and

(iii) there is a system of half-treks with no sided intersection from Y to P (v).

We remark that if P (v) = ∅, then Y = ∅ satisfies the half-trek criterion withrespect to v. We are now ready to state the main results of this paper.

Theorem 1 (HTC-identifiability). Let (Yv : v ∈ V ) be a family of subsets of thevertex set V of a mixed graph G. If, for each node v, the set Yv satisfies the half-trekcriterion with respect to v, and there is a total ordering ≺ on the vertex set V suchthat w ≺ v whenever w ∈ Yv ∩H(v), then G is rationally identifiable.

Note that the existence of such a total ordering is equivalent to the conditionthat the relation w ∈ Yv∩H(v) does not admit cycles; given the family (Yv : v ∈ V )this can be tested in polynomial time in the size of the graph. However, we do notknow whether the existence of a family (Yv : v ∈ V ), with Yv satisfying the half-trekcriterion with respect to v for each node v, can be checked in polynomial time.

Theorem 2 (HTC-non-identifiability). Suppose G is a mixed graph in which everyfamily (Yv : v ∈ V ) of subsets of the vertex set V either contains a set Yv thatfails to satisfy the half-trek criterion with respect to v or contains a pair of sets(Yv, Yw) with v ∈ Yw and w ∈ Yv. Then the parametrization φG is genericallyinfinite-to-one.

The main ideas underlying the two results are as follows. Under the conditionsgiven in Theorem 1, it is possible to recover the entries in the matrix Λ, column-by-column, following the given ordering of the nodes. Each column is found bysolving a linear equation system that can be proven to have a unique solution. Thedetails of these computations are given in Section 8, where we prove Theorem 1.The proof of Theorem 2 is also in Section 8 and rests on the fact that under thegiven conditions the Jacobian of φG cannot have full rank.

In light of the two theorems we refer to a mixed graph G as

(i) HTC-identifiable, if it satisfies the conditions of Theorem 1,(ii) HTC-infinite-to-one, if it satisfies the conditions of Theorem 2,

(iii) HTC-classifiable, if it is either HTC-identifiable or HTC-infinite-to-one,(iv) HTC-inconclusive, if it is not HTC-classifiable.

We now give a first example of an HTC-identifiable graph. Additional examples willbe given in Section 5, where we will see graphs that are generically h-to-one with2 ≤ h < ∞, but also that HTC-inconclusive graphs may be rationally identifiableor generically infinite-to-one.

Page 8: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

8 RINA FOYGEL, JAN DRAISMA, AND MATHIAS DRTON

Example 5. The graph in Figure 2 is HTC-identifiable, which can be shown asfollows. Let

Y1 = ∅, Y2 = 5, Y3 = 2, Y4 = 2, Y5 = 3 .

Then each Yv satisfies the half-trek criterion with respect to v because

(a) trivially, P (v) = ∅ for v = 1;(b) for v = 2, we have 5↔ 1→ 2;(c) for v = 3, we have 2→ 3;(d) for v = 4, we have 2→ 3→ 4; and(e) for v = 5, we have 3→ 4→ 5.

Considering the descendant sets H(v), we find that

Y1 ∩H(1) = ∅, Y2 ∩H(2) = 5, Y3 ∩H(3) = ∅,Y4 ∩H(4) = 2, Y5 ∩H(5) = 3 .

Hence, any ordering ≺ respecting 3 ≺ 5 ≺ 2 ≺ 4 will satisfy the conditions ofTheorem 1.

A mixed graph G = (V,D,B) is simple if there is at most one edge between anypair of nodes, that is, if D∩B = ∅ and v → w ∈ D implies w → v 6∈ D. As observedin [BP02b], simple acyclic mixed graphs are rationally identifiable; compare alsoCorollary 3 in [DFS11]. It is not difficult to see that Theorem 1 includes thisobservation as a special case.

Proposition 1. If G is a simple acyclic mixed graph, then G is HTC-identifiable.

Proof. Since G is simple, it holds for every node v ∈ V that P (v) ∩ S(v) = ∅ and,thus, P (v) satisfies the half-trek criterion with respect to v. An acyclic graph hasat least one topological ordering ≺, that is, an ordering such that v → w ∈ D onlyif v ≺ w. In other words, w ∈ P (v) implies w ≺ v. Hence, the family (P (v) : v ∈ V )together with a topological ordering ≺ satisfies the conditions of Theorem 1.

Another straightforward observation is that the map φG cannot be genericallyfinite-to-one if the dimension of the domain of definition RDreg × PD(B) is largerthan the space of m×m symmetric matrices that contains the image of φG. Thisoccurs if |D|+ |B| is larger than

(m2

). Theorem 2 covers this observation.

Proposition 2. If a mixed graph G = (V,D,B) with V = [m] has |D|+ |B| >(m2

)edges, then G is HTC-infinite-to-one.

Proof. Suppose G is not HTC-infinite-to-one. Then there exists subsets (Yv : v ∈V ), where each Yv satisfies the half-trek criterion with respect to v and for any pairof sets (Yv, Yw) it holds that v ∈ Yw implies w 6∈ Yv.

Fix a node v ∈ V . For every directed edge u→ v ∈ D, there is a correspondingnode y ∈ Yv for which it holds, by Definition 4, that y ↔ v 6∈ B. Therefore, if thereare dv directed edges pointing to v, then there are dv nodes, namely, the ones inYv, that are not adjacent to v in the bidirected part (V,B). If we consider anothernode w ∈ V , with dw parents, then there are again dw non-adjacencies u,w,u ∈ Yw, in the bidirected part. Moreover, v, w cannot appear as a non-adjacencyfor both node v and node w because of the requirement that v ∈ Yw imply w 6∈ Yv.We conclude that there are at least |D| non-edges in the bidirected part. In otherwords, |D|+ |B| ≤

(m2

).

Page 9: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

IDENTIFIABILITY OF LINEAR STRUCTURAL EQUATION MODELS 9

We conclude the discussion of Theorems 1 and 2 by pointing out that HTC-identifiability is equivalent to a seemingly weaker criterion.

Definition 5. A set of nodes Y ⊂ V satisfies the weak half-trek criterion withrespect to node v ∈ V if

(i) |Y | = |P (v)|,(ii) Y ∩ (v ∪ S(v)) = ∅, and

(iii) there is a system of treks with no sided intersection from Y to P (v) suchthat for any w ∈ Y ∩H(v), the trek originating at w is a half-trek.

Lemma 1. Suppose the set W ⊂ V satisfies the weak half-trek criterion with respectto some node v. Then there exists a set Y satisfying the half-trek criterion withrespect to v, such that Y ∩H(v) = W ∩H(v).

Lemma 1 is proved in the appendix. It yields the following result, which is provedin Section 8.

Theorem 3 (Weak HTC). Theorems 1 and 2 hold when using the weak half-trekcriterion instead of the half-trek criterion. Moreover, a graph G can be provedto be rationally identifiable (or generically infinite-to-one) using the weak half-trekcriterion if and only if G is HTC-identifiable (or HTC-infinite-to-one).

4. G-criterion

The G-criterion, proposed in [BP06], is a sufficient criterion for rational identifi-ability in acyclic mixed graphs. The criterion attempts to prove the fiber F(Λ,Ω)to be equal to (Λ,Ω) by solving the equation system

Σ = (I − Λ)−TΩ(I − Λ)−1

in stepwise manner. The steps yield the entries in Λ column-by-column and, si-multaneously, more and more rows and columns for principal submatrices of Ω. Asexplained in Section 8, the new half-trek method we proposed in Section 3 startsfrom an equation system that has Ω eliminated and then only proves the entries ofΛ to be uniquely identified. In this section, we show that, due to this key simpli-fication, the sufficient condition in the half-trek method provides an improvementover the G-criterion for acyclic mixed graphs.

To prepare for the comparison of the two criteria, we first restate the identifia-bility theorem associated to the G-criterion in our own notation. Enumerate thevertex set of an acyclic mixed graph G according to any topological ordering asV = [m] = 1, . . . ,m. (Then v → w only if v < w.) Use the ordering to uniquelyassociate bidirected edges to individual nodes by defining, for each v ∈ V , the setsof siblings S<(v) = w ∈ S(v) : w < v and S>(v) = w ∈ S(v) : w > v. For atrek π, we write t(π) to denote the target node; that is, π is a trek from some nodeto t(π).

Definition 6 ([BP06]). A set of nodes A ⊂ V satisfies the G-criterion with respectto a node v ∈ V if A ⊂ V \ v and A can be partitioned into two (disjoint) setsY,Z with |Y | = |P (v)| and |Z| = |S<(v)|, with two systems of treks Π : Y ⇒ P (v)and Ψ : Z ⇒ S<(v), such that the following condition holds:

If each trek π ∈ Π is extended to a path π′ by adding the edge t(π) → v to theright-hand side, and each trek π ∈ Ψ is similarly extended using t(π)↔ v, then theset of paths π′ : π ∈ Π ∪Ψ is a set of treks that has no sided intersection exceptat the common target node v.

Page 10: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

10 RINA FOYGEL, JAN DRAISMA, AND MATHIAS DRTON

(a)1

2

3

4

5

(b)

1

2

3

4

5(c)

1

2 3

4

5

(d)1

2 4

5

3

(e)1

2

3

4

5

Figure 3. Rationally identifiable mixed graphs.

Note that the paths π′ for π ∈ Π are always treks. For ψ ∈ Ψ, the requirementthat ψ′ is a trek means that ψ cannot have an arrowhead at its target node.

For the statement of the main theorem about identifiability using the G-criterion,define the depth of a node v to be the length of the longest directed path terminatingat v. This number is denoted by Depth(v).

Theorem 4 ([BP06]). Suppose (Av : v ∈ V ) is a family of subsets of the vertex setV of an acyclic mixed graph G and, for each v, the set Av satisfies the G-criterionwith respect to v. Then G is rationally identifiable if at least one of the followingtwo conditions is satisfied:

(C1) For all v and all w ∈ Av, it holds that Depth(w) < Depth(v).(C2) For all v and all w ∈ Av ∩

(H(v) ∪ S>(v)

), the trek associated to node w

in the definition of the G-criterion is a half-trek. Furthermore, there is atotal ordering ≺ on V , such that if w ∈ Av ∩

(H(v) ∪ S>(v)

), then w ≺ v.

We remark that the ordering ≺ in condition (C2) need not agree with any topo-logical ordering of the graph. When using only condition (C1) the theorem wasgiven in [BP02a], and the literature is not always clear on which version of theG-criterion is concerned. For instance, all examples in [CK10] can be proven to berationally identifiable by means of Theorem 4 as stated here.

We now compare the G-criterion to the half-trek criterion. We say that a graphG is GC-identifiable if it satisfies the conditions of Theorem 4. The next theoremand the proposition that follows are proved in Section 10. They demonstrate thatthe half-trek method provides an improvement over the G-criterion even for acylicmixed graphs.

Theorem 5. A GC-identifiable acyclic mixed graph is also HTC-identifiable.

The graph in Figure 2 is HTC-identifiable, as was shown in Example 5.

Proposition 3. The acyclic mixed graph in Figure 2 is not GC-identifiable.

Page 11: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

IDENTIFIABILITY OF LINEAR STRUCTURAL EQUATION MODELS 11

(a)

1

2 3

4 5(b)

1

2 3 4 5

(c)

1

3

2 4

5(d)

1

45

2 3

Figure 4. Generically infinite-to-one graphs.

5. Examples

In the previous section, the acyclic mixed graph from Figure 2 was shown tobe HTC-identifiable but not GC-identifiable. In this section, we give several otherexamples that illustrate the conditions of our theorems and the ground that liesbeyond them. The examples are selected from the computational experiments thatwe report on in Section 6. We begin with the identifiable class.

Example 6. Figure 3 shows 5 rationally identifiable mixed graphs:

(a) This graph is simple and acyclic and, thus, HTC- and GC-identifiable; re-call Proposition 1. There are pairs (Λ,Ω) for which the fiber F(Λ,Ω) haspositive dimension. By Theorem 2 in [DFS11], removing the edge 1 ↔ 3would give a new graph with all fibers of the form F(Λ,Ω) = (Λ,Ω).

(b) The next graph is acyclic but not simple. It is HTC- and GC-identifiable.(c) This acyclic graph is HTC-inconclusive. The bidirected part being con-

nected, the example is not covered by the graph decomposition techniquediscussed in Section 7.

(d) This is an example of a cyclic graph that is HTC-identifiable.(e) This cyclic graph is HTC-inconclusive.

On m = 5 nodes, graphs with more than(

52

)= 10 edges are trivially generically

infinite-to-one. The next example gives non-trivial non-identifiable graphs.

Example 7. All 4 graphs in Figure 4 are generically infinite-to-one. The acyclicgraph in (a) and the cyclic graph in (c) are HTC-infinite-to-one. The acyclic graphin (b) and the cyclic graph in (d) are HTC-inconclusive.

Many HTC-inconclusive graphs have fibers that are of cardinality 2 ≤ h < ∞.An example of an acyclic 4-node graph that is generically 2-to-one was given in[Bri04]. Our next example lists more graphs of this generically finite-to-one type.

Example 8. Figure 5 shows four mixed graphs that are HTC-inconclusive and notgenerically identifiable. All the graphs have fibers that are generically finite:

Page 12: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

12 RINA FOYGEL, JAN DRAISMA, AND MATHIAS DRTON

(a)

1

2

3

4

5

(b)

1

2 4

5

3

(c)

1

2

34

5

(d)

1

2

4

5

3

Figure 5. Generically finite-to-one graphs.

(a) This graph is generically 2-to-1. We note that the coefficients λv5, v ∈ [4],can be identified; that is, any two matrices Λ,Λ′ appearing in the same fiberhave identical fifth column.

(b) Generically, the fibers of this graph have cardinality either one or three.For instance, let

ω11 = · · · = ω55 = 1, ω12 = ω13 = ω15 =1

5, λ23 = 1.

Define

f(λ12) = 529λ412 − 460λ3

12 − 3642λ212 − 2380λ12 − 4271.

Then, not considering the non-generic situation with f(λ12) = 0, we have

|F(Λ,Ω)| =

3 if f(λ12) > 0,

1 if f(λ12) < 0.

The polynomial f has two roots which are approximately −2.16 and 3.44.(c) As shown in [DFS11], a cycle of length 3 or more is generically 2-to-1.(d) The next graph is not generically identifiable. Generically, its fibers have

at least two elements but not more than 10. Using the terminology fromDefinition 7 below, the graph has degree of identifiability 10. We do notknow of an example of a fiber with more than two elements.

6. Computational experiments

When the number m of nodes in the graph is small, then the identification prob-lem can be fully solved by means of algebraic techniques. In this section we reporton the results of an exhaustive study of all mixed graphs with m ≤ 5 nodes as wellas simulations for graphs with m = 6 and 7 nodes. In our exhaustive computations,counts of graphs refer to unlabeled graphs, that is, we count isomorphism classes

Page 13: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

IDENTIFIABILITY OF LINEAR STRUCTURAL EQUATION MODELS 13

Table 1. Classification of unlabeled mixed graphs with 3 ≤ m ≤ 5nodes; column ‘HTC’ gives counts of HTC-classifiable graphs.

m = 3 m = 4 m = 5

Unlabeled mixed graphs Total HTC Total HTC Total HTC

Acyclic, ≤(m2

)edges 22 715 103,670

rationally identifiable 17 17 343 343 32,378 32,257generically finite-to-one 0 — 4 — 1,166 —generically ∞-to-one 5 5 368 368 70,126 70,099

Acyclic, >(m2

)edges 18 852 152,520

Cyclic, ≤(m2

)edges 6 718 348,175

rationally identifiable 2 2 239 230 91,040 78,586generically finite-to-one 1 — 75 — 44,703 —generically ∞-to-one 3 3 404 383 212,432 202,697

Cyclic, >(m2

)edges 58 9,307 8,439,859

of graphs with respect to permutation of the vertex set V = [m]. A general intro-duction to the algebraic techniques that underly our computations can be foundin [CLO07]. The use of computer algebra for parameter identification problems isexplained in [GPSS10]. We give some more details in Appendix A. All algebraiccomputations were done using the software Singular [DGPS11]; the combinatorialcriteria were implemented in R [R D11].

The results for m ≤ 5 are given in Table 1. This table distinguishes betweenacyclic and cyclic (that is, non-acyclic) graphs. In each case, we single out thegraphs with more than

(m2

)edges. These are trivially generically infinite-to-one and

also HTC-infinite-to-one according to Proposition 2. The remaining graphs are clas-sified into three disjoint groups, namely, rationally identifiable graphs, genericallyinfinite-to-one graphs and generically finite-to-one graphs. The following notionmakes the distinctions and terminology precise. Here, we let CDreg to be defined

as RDreg but allowing for complex matrix entries. We write Cm×msym for the space ofsymmetric m×m complex matrices.

Definition 7. Let G = (V,D,B) be a mixed graph. Then the complex rational mapφG,C, obtained by extending the map φG to CDreg × Cm×msym , is generically h-to-onewith h ∈ N ∪ ∞, and we call h = ID(G) the degree of identifiability of G.

A mixed graph G is rationally identifiable if and only if its degree of identifiabilityID(G) = 1. Similarly, G is generically infinite-to-one if and only if ID(G) =∞; inthat case the fiber F(Λ,Ω) ⊂ RDreg×PD(B) defined in (1.4) is generically of positivedimension. In Table 1, a graph G is generically finite-to-one if 2 ≤ ID(G) < ∞and, thus, the fiber F(Λ,Ω) is generically finite with |F(Λ,Ω)| ≤ ID(G). If ID(G)is finite and even, then G cannot be generically identifiable because polynomialequations have complex solutions appearing in conjugate pairs and F(Λ,Ω) alwayscontains at least one (real) point, namely, the pair (Λ,Ω) itself. If ID(G) is odd,then we cannot exclude the possibility that the equation defining the fiber F(Λ,Ω)

Page 14: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

14 RINA FOYGEL, JAN DRAISMA, AND MATHIAS DRTON

Acy

clic

C

yclic

1 2 3 4 56

78

910

1112 13

14 15 16 17 18 19 20 215000

05000 HTC-identifiable

HTC-infinite-to-oneHTC-inconclusive

Figure 6. Classification of labeled mixed graphs with m = 7nodes. Each bar represents 5,000 randomly drawn graphs withfixed number of edges, ranging from 1 to

(m2

).

generically only has one real point, leading to generic identifiability. However, wedid not observe this in any examples we checked.

Table 1 shows that our half-trek method yields a perfect classification of acyclicgraphs with m ≤ 4 nodes and cyclic graphs with m ≤ 3 nodes. Among the acyclicgraphs with m = 5 nodes and at most

(m2

)= 10 edges, our method misses 121

rationally identifiable graphs and 27 generically infinite-to-one graphs. The gapsare larger for cyclic graphs with at most 10 edges, but the method still classifies86% of the rationally identifiable graphs correctly and misses less than 5% of thegenerically infinite-to-one graphs. The degree of identifiability ID(G) of an acyclicgraph G with 5 nodes can be any number in [4]. For example, the graphs inFigure 5(a) and (b) have ID(G) equal to 2 and 3, respectively. For a cyclic graphG with 5 nodes, the degree can be any number in [8] ∪ 10; recall the example inFigure 5(d).

In our computations we tracked which acyclic graphs are rationally identifiableaccording to the G-criterion as in Theorem 4. Since the method depends on thechoice of a topological ordering of the nodes, we tested each possible topologicalordering of the nodes. Our computation shows that the G-criterion finds all ra-tionally identifiable acyclic graphs with m ≤ 4 nodes. For m = 5, the G-criterionproves 31,830 acyclic graphs to be rationally identifiable, that is, it misses 427 ofthe HTC-identifiable acyclic graphs.

Exhaustive computations become prohibitive for more than 5 nodes. Insteadwe randomly generated mixed graphs, with m = 6 or m = 7 nodes, and testedwhether they are HTC-identifiable, HTC-infinite-to-one or HTC-inconclusive. More

Page 15: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

IDENTIFIABILITY OF LINEAR STRUCTURAL EQUATION MODELS 15

(a)41

53 2

(b)

3 2 5

41(c)

41

23

Figure 7. An acyclic mixed graph shown in (a) and its two mixedcomponents shown in (b) and (c).

precisely, for each value n = 1, 2, . . . ,(m2

), we randomly sampled 5, 000 labeled

mixed graphs on m nodes with n edges, by selecting a subset of size n from theset of all possible edges, which consists of 2 ·

(m2

)directed edges and

(m2

)bidirected

edges. The results of these simulations for m = 7 are shown in Figure 6. (Theresults for m = 6 were very similar and are not shown.) As can be expected,the proportion of graphs that are acyclic decreases as m increases. Among bothacyclic and cyclic graphs with at most

(m2

)nodes, the proportion of graphs that are

generically infinite-to-one increases as m increases. For each value of m, the vastmajority of graphs that are rationally identifiable or generically infinite-to-one, areHTC-classifiable. Most but not all of the HTC-identifiable acyclic graphs are alsoGC-identifiable; the difference is too small to be visible in the figure.

7. Decomposition of acyclic graphs

In this section we discuss how, for acyclic graphs, the scope of applicabilityof our half-trek method can be extended by using a graph decomposition due to[Tia05]. Let G = (V,D,B) be an acyclic mixed graph, and let C1, . . . , Ck ⊂ V bethe (pairwise disjoint) vertex sets of the connected components of the bidirectedpart (V,B). For j ∈ [k], let Bj = B ∩ (Cj × Cj) be the bidirected edges in the jthconnected component. Define Vj to be the union of Cj and any parents of nodes inCj , that is,

Vj = Cj ∪ P (v) : v ∈ Cj , j = 1, . . . , k.

Clearly, the sets V1, . . . , Vk need not be pairwise disjoint. Let Dj be the set of edgesv → w in the directed part (V,D) that have v ∈ Vj and w ∈ Cj . The decompositionof [Tia05] involves the graphs Gj = (Vj , Dj , Bj), for j ∈ [k]. We refer to these asthe mixed components G1, . . . , Gk of G. Figure 7 gives an example.

The mixed components G1, . . . , Gk create a partition of the edges of G. There isan associated partition of the entries of Λ ∈ RD that yields submatrices Λ1, . . . ,Λkwith each Λj ∈ RDj ; recall that for an acyclic graph RDreg = RD. Similarly, fromΩ ∈ PD(B), we create matrices Ω1, . . . ,Ωk with each Ωj ∈ PD(Bj), where PD(Bj)is defined with respect to the graph Gj , that is, the set contains matrices indexedby Vj × Vj . We define Ωj by taking the submatrix ΩCj ,Cj

from Ω and extending itby setting (Ωj)vv = 1 for all v ∈ Vj \ Cj . The work leading up to Theorems 1 and2 in [Tia05] shows that, for all j ∈ [k], there is a rational map fj defined on theentire cone of m×m positive definite matrices such that

fj φG(Λ,Ω) = φGj(Λj ,Ωj)

for all Λ ∈ RD and Ω ∈ PD(B). In turn, there is a rational map g definedeverywhere on the product of the relevant cones of positive definite matrices such

Page 16: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

16 RINA FOYGEL, JAN DRAISMA, AND MATHIAS DRTON

thatg(φG1(Λ1,Ω1), . . . , φGk

(Λk,Ωk)) = φG(Λ,Ω)

for all Λ ∈ RD and Ω ∈ PD(B). We thus obtain the following theorem.

Theorem 6. For an acyclic mixed graph G with mixed components G1, . . . , Gk,the following holds:

(i) G is rationally (or generically) identifiable if and only if all componentsG1, . . . , Gk are rationally (or generically) identifiable;

(ii) G is generically infinite-to-one if and only if there exists a component Gjthat is generically infinite-to-one;

(iii) if each Gj is generically hj-to-one with hj <∞ then G is generically h-to-

one with h =∏kj=1 hj.

We remark that this theorem could also be stated as ID(G) =∏kj=1 ID(Gj), in

terms of the degree of identifiability from Definition 7.The next theorem makes the observation that when applying our half-trek method

to an acyclic graph, we may always first decompose the graph into its mixed com-ponents, which may result into computational savings.

Theorem 7. If an acyclic mixed graph G is HTC-identifiable then all its mixedcomponents G1, . . . , Gk are HTC-identifiable. Furthermore, G is HTC-infinite-to-one if and only if there exists a mixed component Gj that is HTC-infinite-to-one.

Proof. The claim about HTC-identifiability follows from Lemma 7 in Section 11.The second statement is a consequence of Lemmas 8 and 9, also from Section 11.

The benefit of the graph decomposition goes beyond computation in that it ispossible that identification methods apply to all mixed components but not theoriginal graph. In [Tia05], this is exemplified for the G-criterion. More precisely,the 4-node example given there concerns the early version of the G-criterion from[BP02a] that includes only condition (C1) from Theorem 4 but not condition (C2),which is due to [BP06]. However, graph decomposition allows one to also extend thescope of our more general half-trek method, where passing to mixed componentscan avoid problems with finding a suitable total ordering of the vertex set. Surpris-ingly, however, the extension is possible only for the sufficient condition, that is,HTC-identifiability; Theorem 7 gives an equivalence result for HTC-infinite-to-onegraphs.

Proposition 4. The acyclic mixed graph in Figure 7(a) is not HTC-identifiablebut both its mixed components are HTC-identifiable.

Proof. Suppose for a contradiction that the original graph G is HTC-identifiableand that the sets Y3, Y4 and Y5 are part of the family of sets appearing in Theorem 1.In particular, each set has two elements and satisfies the half-trek criterion withrespect to its subscript. Now, the presence of the edge 2 ↔ 3 implies that Y3 ⊂1, 4, 5. Moreover, Y3 6= 1, 4 because the sole half-trek from 4 to 3 has 1 in itsright-hand side and all half-treks from 1 to 3 are directed paths and thus have thesource 1 on their right-hand side as well. It follows that 5 ∈ Y3 and, thus, 3 6∈ Y5.Since 2↔ 5 is in G, it must hold that Y5 = 1, 4. Examining the descendant setsH(v) we see that the total ordering ≺ in Theorem 1 ought to satisfy 4 ≺ 5 ≺ 3.Since 1 ∈ S(4) and 3, 5 ∈ H(4), we conclude that Y4 ⊂ 2, which is a contradictionbecause Y4 must have two elements.

Page 17: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

IDENTIFIABILITY OF LINEAR STRUCTURAL EQUATION MODELS 17

Turning to the mixed components of G, it is clear that the component shown inFigure 7(c) is HTC-identifiable because it is a simple graph; recall Proposition 1.The component in Figure 7(b) is HTC-identifiable because Theorem 1 applies withthe choice of

Y1 = Y4 = ∅, Y2 = 1, Y5 = 1, 4, Y3 = 1, 5,

and any ordering that respects 5 ≺ 3.

As seen in Table 1, the half-trek method misses 121 rationally identifiable acyclicgraphs with 5 nodes, among them is the example from Proposition 4. After graphdecomposition, the half-trek method proves 9 of the 121 examples to be ratio-nally identifiable. The remaining 112 graphs all have a connected bidirected part;see Figure 3(c) for an example. On 5 nodes, there are 27 generically infinite-to-one graphs that are HTC-inconclusive. All of these have a connected bidirectedpart. (For larger graphs, we expect that there will be some graphs that are notbidirected-connected, where the half-trek method combined with decompositionwill not apply.)

8. Proofs for the half-trek criterion

In this section we prove the two main theorems stated in Section 3. We beginwith the identifiability theorem.

Theorem 1 (HTC-identifiability). Let (Yv : v ∈ V ) be a family of subsets of thevertex set V of a mixed graph G. If, for each node v, the set Yv satisfies the half-trekcriterion with respect to v, and there is a total ordering ≺ on the vertex set V suchthat w ≺ v whenever w ∈ Yv ∩H(v), then G is rationally identifiable.

Proof of Theorem 1. Let Σ = φG(Λ0,Ω0) be a matrix in the image of φG, given bya generically chosen pair (Λ0,Ω0) ∈ Θ = RDreg × PD(B). For generic identifiability,we need to show that the equation

(8.1) Σ = (I − Λ)−TΩ(I − Λ)−1

has a unique solution in Θ, namely, (Λ,Ω) = (Λ0,Ω0). However, a pair (Λ,Ω) solves(8.1) if and only if[

(I − Λ)TΣ(I − Λ)]vw

= 0 ∀(v, w) 6∈ B and v 6= w,(8.2)

and [(I − Λ)TΣ(I − Λ)

]vw

= Ωvw ∀(v, w) ∈ B or v = w.(8.3)

The non-zero entries of Ω appearing in (8.3) are freely varying real numbers thatare subject only to the requirement that Ω be positive definite. For cyclic graphs,(8.1) contains rational equations. Hence, the focus is on (8.2), which defines apolynomial equation system even when the graph is cyclic.

We prove the theorem by solving the equations (8.2) in stepwise manner accord-ing to the ordering ≺. When visiting node v, the goal is to recover the vth columnof Λ as a function of Σ. Based on solving linear equation systems, the functions ofΣ that give the entries of Λ will always be rational functions, proving our strongerclaim of rational (as opposed to mere generic) identifiability.

Page 18: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

18 RINA FOYGEL, JAN DRAISMA, AND MATHIAS DRTON

For our proof we proceed by induction and assume that, for all w ≺ v, we haverecovered the entries of the vector ΛP (w),w as (rational) expressions in Σ. To solve

for ΛP (v),v, let Yv = y1, . . . , yn and P (v) = p1, . . . , pn. Define A ∈ Rn×n as

Aij =

[(I − Λ)TΣ

]yipj

if yi ∈ H(v),

Σyipj if yi 6∈ H(v).

Define b ∈ Rn as

bi =

[(I − Λ)TΣ

]yiv

if yi ∈ H(v),

Σyiv if yi 6∈ H(v).

Note that both A and b depend only on Σ and the columns ΛP (w),w with w ∈Yv ∩ H(v), which are assumed already to be known as a function of Σ becausew ∈ Yv ∩ H(v) implies w ≺ v. We now claim that the vector ΛP (v),v solves theequation system A · ΛP (v),v = b.

First, consider an index i with yi ∈ Yv ∩ H(v). Since Yv satisfies the half-trekcriterion with respect to v, the node yi 6= v is not a sibling of v. Therefore, by(8.2),[

(I − Λ)TΣ(I − Λ)]yiv

= 0 =⇒[(I − Λ)TΣΛ

]yiv

=[(I − Λ)TΣ

]yiv

.

It follows that

(A · ΛP (v),v

)i

=

n∑j=1

[(I − Λ)TΣ)

]yipj

Λpjv =

[(I − Λ)TΣΛ

]yiv

=[(I − Λ)TΣ

]yiv

= bi.

Second, let i be an index with yi ∈ Yv \H(v). Then(A · ΛP (v),v

)i

=

n∑j=1

ΣyipjΛpjv = [ΣΛ]yiv =[(I − Λ)−TΩ(I − Λ)−1Λ

]yiv

.

By definition of H(v), we know that [(I − Λ)−TΩ]yiv = 0. Adding this zero andusing that (I − Λ)−1 = I + (I − Λ)−1Λ, we obtain that(

A · ΛP (v),v

)i

=[(I − Λ)−TΩ(I − Λ)−1Λ

]yiv

+[(I − Λ)−TΩ

]yiv

=[(I − Λ)−TΩ(I − Λ)−1

]yiv

= Σyiv = bi.

Therefore, A · ΛP (v),v = b, as claimed.By Lemma 2 below, the matrix A is invertible in the generic situation. There-

fore, we have shown that ΛP (v),v = A−1b is a rational function of Σ. Proceedinginductively according to the vertex ordering ≺, we recover ΛP (v),v for all v and,thus, the entire matrix Λ, as desired.

Lemma 2. Let v ∈ V be any node. Let Y ⊂ V \ (v ∪ S(v)), with |Y | = |P (v)| =n. Write Y = y1, . . . , yn and P (v) = p1, . . . , pn, and define the matrix A as

Aij =

[(I − Λ)TΣ]yipj , yi ∈ H(v),Σyipj , yi 6∈ H(v).

If Y satisfies the half-trek criterion with respect to v, then A is generically invertible.

Page 19: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

IDENTIFIABILITY OF LINEAR STRUCTURAL EQUATION MODELS 19

Proof. Recall the trek-rule from (2.3). Let H(v, w) ⊂ T (v, w) be the set of allhalf-treks from v to w. Then, for each i, j ∈ 1, . . . , n,

Aij =

∑π∈H(yi,pj) π(λ, ω), yi ∈ H(v),∑π∈T (yi,pj) π(λ, ω), yi 6∈ H(v).

For a system of treks Π, define the monomial

Π(λ, ω) =∏π∈Π

π(λ, ω).

Then

det(A) =∑

Ψ:Y⇒P

(−1)|Ψ|Ψ(λ, ω),

where the sum is over systems of treks Ψ for which all treks ψ ∈ Ψ with sourcesin H(v) are half-treks. (The sign |Ψ| is the sign of the permutation that writesp1, . . . , pn in the order of their appearance as targets of the treks in Ψ.)

By assumption, there exists some system of half-treks with no sided intersectionfrom Y to P . Let Π be such a system, with minimal total length among all suchsystems. Now take any system of treks Ψ from Y to P , such that Π(λ, ω) = Ψ(λ, ω).(We do not assume that Ψ has no sided intersection, or has any half-treks). InLemma 3 immediately below, we prove that Ψ = Π for any such Ψ. Therefore, thecoefficient of the monomial Π(λ, ω) in det(A) is given by (−1)|Π|, and det(A) isnot the zero polynomial/power series. For generic choices of (Λ,Ω) it thus holdsthat det(A) 6= 0.

Lemma 3. Suppose Y, P ⊂ V are subsets of equal cardinality, and Π : Y ⇒ P is asystem of half-treks with no sided intersection, with minimal total length among allsuch systems. If for a system of treks Ψ : Y ⇒ P the monomial Ψ(λ,Ω) = Π(λ, ω),then Ψ = Π.

Proof. Let Y = y1, . . . , yn, P = p1, . . . , pn, and Π = π1, . . . , πn, where πihas source yi and target pi. Since Π has minimal total length among all systems ofhalf-treks from Y to P with no sided intersection, Π cannot have a sub-system ofthe form

πi1 : yi1 · · · yi2 · · · pi1 ,πi2 : yi2 · · · yi3 · · · pi2 ,...

πir−1: yir−1

· · · yir · · · pir−1

πir : yir · · · yi1 · · · pir .If there were such a sub-system, each trek in the sub-system could be shortened, thatis, replace πi1 : yi1 · · · yi2 · · · pi1 with its second section, yi2 · · · pi1 , etc. Therefore,we can relabel the elements of Y , P and Π such that j ≤ i if trek πi contains yj .

Write the second system of treks as Ψ = ψ1, . . . , ψn, where ψi has source yiand target pα(i). Here, α is some permutation of the indices in [n]. We claimthat α(n) = n and ψn = πn. Assuming this is true, let Y ′ = y1, . . . , yn−1 andP ′ = p1, . . . , pn−1, and let Π′ and Ψ′ be the induced sub-systems of treks fromY ′ to P ′. The ordering on Y ′ follows the same rule as the ordering on Y . Thenψn = πn implies that Π′(λ, ω) = Π(λ, ω)/πn(λ, ω) = Ψ(λ, ω)/ψn(λ, ω) = Ψ′(λ, ω).By induction on n, we conclude that Ψ = Π.

Page 20: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

20 RINA FOYGEL, JAN DRAISMA, AND MATHIAS DRTON

It remains to show that α(n) = n and ψn = πn. Write

(8.4) πn : yn → zn1 → zn2 → . . .→ znk = pn ,

where yn → zn1 represents that either yn → zn1 or yn ↔ zn1 . By definition of theordering on Y , the node yn does not appear in any trek in Π, except for πn. And,node yn appears only once in πn, since Π has minimal total length. Hence, the onlyedge in Π containing yn is the edge yn → zn1 . Since Ψ(λ,Ω) = Π(λ, ω), this impliesthat the only edge in Ψ containing yn is the same edge yn → zn1 . Therefore, ψnmust be of the form

ψn : yn → zn1 · · · .Case 1: The path ψn consists of only the edge yn → zn1 . Then zn1 ∈ P . If

zn1 = pj for j < n, then π would have a sided intersection, which is a contradiction.Therefore, zn1 = pn. Since Π is a system of minimal length, πn must also consist ofonly yn → zn1 = pn, which show that ψn = πn.

Case 2: The path ψn is of the form

ψn : yn → zn1 → . . .

Since Π has no sided intersection, there is no edge of the form pn → · in Π. SinceΨ(λ,Ω) = Π(λ, ω), we obtain that zn1 6= pn, and thus k ≥ 2 in (8.4). Now observethat the only edge of the form zn1 → · in Π, is the edge zn1 → zn2 ; otherwise, twotreks in Π would have a sided intersection at zn1 . It follows that

ψn : yn → zn1 → zn2 · · · .Continue now to add edges one at a time to the path ψn, applying the reasoningjust used at all but the last edge of ψn. Reasoning as in Case 1 for the last edge ofψn, we find that ψn and πn are both equal to

yn → zn1 → zn2 → . . .→ znr = pn .

This completes the proof that α(n) = n and ψn = πn.

We now turn to the proof of non-identifiability theorem.

Theorem 2 (HTC-non-identifiability). Suppose G is a mixed graph in which everyfamily (Yv : v ∈ V ) of subsets of the vertex set V either contains a set Yv thatfails to satisfy the half-trek criterion with respect to v or contains a pair of sets(Yv, Yw) with v ∈ Yw and w ∈ Yv. Then the parametrization φG is genericallyinfinite-to-one.

Proof of Theorem 2. Let

N = v, w : v 6= w, (v, w) 6∈ B ,be the set of (unordered) ‘nonsibling pairs’ in the graph. Treating Σ as fixed, letJ ∈ R|N |×|D| be the Jacobian of the equations in (8.2), taking partial derivativeswith respect to the non-zero entries of Λ. The entries of J are given by

(8.5) Jv,w,(u,v) = −[(I − Λ)TΣ

]wu

, v, w ∈ N, u ∈ P (v),

and all other entries zero. By Lemma 4 below, it is sufficient to show that, underthe conditions of the theorem, J does not have full column rank.

In the remainder of this proof, we always let Σ = φG(Λ,Ω) when consideringJ. If J has generically full column rank, then we can choose a set M ⊂ N with|M | = |D| =

∑v∈V |P (v)|, such that det(JM,D) is not the zero polynomial, where

Page 21: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

IDENTIFIABILITY OF LINEAR STRUCTURAL EQUATION MODELS 21

JM,D is the square submatrix formed by taking all rows of J that are indexed byM . By the definition of the determinant, there must be a partition of M = ∪vMv

such that for all v, we have

det(JMv,(P (v),v)

)6= 0 .

By (8.5), each entry w1, w2 ∈ Mv must have either w1 = v or w2 = v. WritingYv = w : v, w ∈Mv, it holds that

det([

(I − Λ)TΣ]Yv,P (v)

)= ±det

(JYv,v,(P (v),v)

)= ±det

(JMv,(P (v),v)

)is non-zero. By Lemma 5 below, this implies that each set Yv satisfies the half-trekcriterion with respect to its indexing node v. Forming a partition of M ⊂ N , thesets Mv are pairwise disjoint. Hence, no two nodes v, w can satisfy both v ∈ Ywand w ∈ Yv because otherwise v, w ∈Mv ∩Mw.

Lemma 4. Define J as in (8.5). If J does not have full column rank, then theparametrization φG is generically infinite-to-one.

Proof. The parametrization φG maps the (|D| + |B| + m)-dimensional set Θ =RDreg×PD(B) to the

(m+1

2

)-dimensional space of symmetric m×m matrices. Since

φG is a rational map, its Jacobian matrix J(φG) achieves its maximal rank atgeneric points in Θ. This maximal rank is the dimension of the image of φG. If thedimension is smaller than |D|+ |B|+m, then, for generic choices of (Λ,Ω) ∈ Θ, thefiber F(Λ,Ω) has positive dimension and is, in particular, infinite. Therefore, ourtheorem is proven if we can show that, under the assumed conditions, the Jacobianof φG does not have a full column rank.

We now claim that the Jacobian of φG, J(φG), is of full column rank at (Λ,Ω)if and only J has full column rank at Λ when taking Σ = φG(Λ,Ω).

Consider the two maps

(8.6) h : (Λ,Ω) 7→ (Λ, φG(Λ,Ω)) and g : (Λ,Σ) 7→ (I − Λ)TΣ(I − Λ),

where the domain of h is Θ and the domain of g is RDreg × PDm. The compositionof the two maps satisfies

(8.7) (g h)(Λ,Ω) = Ω.

Partition J(φG) = (JΛ(φG), JΩ(φG)), where the two parts hold the partial deriva-tives with respect to the |D| free entries of Λ and the |B| + m free entries ofΩ, respectively. Similarly, partition the Jacobian J(g) = (JΛ(g), JΣ(g)). Takingderivatives in (8.7), we obtain that

(8.8) JΛ(g)(Λ, φG(Λ,Ω)) + JΣ(g)(Λ, φG(Λ,Ω))JΛ(φG)(Λ,Ω) = 0

and

(8.9) JΣ(g)(Λ, φG(Λ,Ω))JΩ(φG)(Λ,Ω) =

(0 00 I

),

where we have ordered rows and columns such that the pairs (v, w) defining elementsin N are listed first. Hence, the identity matrix in the lower-right block of the right-hand side of (8.9) is of size |B|+m, and indexed by B∪V . Under the same orderingof rows, observe that using the Jacobian in (8.5) we have

JΛ(g)(Λ, φG(Λ,Ω)) =

(J|Σ=φG(Λ,Ω)

0

).

Page 22: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

22 RINA FOYGEL, JAN DRAISMA, AND MATHIAS DRTON

Combining (8.8) and (8.9), we obtain

(8.10) JΣ(g)(Λ, φG(Λ,Ω)) · J(φG)(Λ,Ω) =

(−J|Σ=φG(Λ,Ω) 0 0

0 I 0

),

where the two blocks of rows are indexed by N and B ∪ V , and the three blocks ofcolumns are indexed byD, B∪V , andN . Now note that the restriction of g obtainedby fixing Λ is an injective map with continuous inverse Σ 7→ (I −Λ)−TΣ(I −Λ)−1.Therefore, the matrix JΣ(g) is invertible, and we deduce that the rank of J(φG) at(Λ,Ω) is equal to the sum of |B| + m and the rank of J at Λ and Σ = φG(Λ,Ω).This proves our claim relating the rank of J(φG) and that of J.

Lemma 5. Let v ∈ V be any node. Let Y ⊂ V \ (v ∪ S(v)), with |Y | = |P (v)| =n. If the matrix J = [(I −Λ)TΣ]Y,P (v) is generically invertible, then Y satisfies thehalf-trek criterion with respect to v.

Proof. Abbreviate P = P (v). We have J = [(I − Λ)TΣ]Y,P = [Ω(I − Λ)−1]Y,P .Hence,

det(J) =∑

W⊂V,|W |=n

det(ΩY,W )det((I − Λ)−1W,P ) .

By assumption, det(J) is not the zero polynomial/power series. Therefore, for someW ⊂ V with |W | = n, we have det(ΩY,W ) 6≡ 0 and det((I − Λ)−1

W,P ) 6≡ 0.

By Menger’s theorem (see, for instance, [Sch04, Theorem 9.1]), the non-vanishingof det((I − Λ)−1

W,P ) implies that there is a system Ψ of pairwise vertex-disjoint

directed paths ψi : wi → . . . → pi, i ∈ [n], whose sources and targets give W =w1, . . . , wn and P = p1, . . . , pn, respectively. Indeed, if no such system exists,then by Menger’s theorem there is a set C of strictly less than n vertices suchthat all directed paths from W to P pass through C. But this implies that thematrix (I − Λ)−1

W,P factors as (I − Λ)−1W,C · (I − Λ)−1

C,P , and |C| < n implies that

det((I − Λ)−1W,P ) = 0, a contradiction. Note that by erasing loops, we can further

arrange that the ψi do not have self-intersections.Since det(ΩY,W ) 6= 0, we can index Y = y1, . . . , yn such that Ωyiwi 6= 0 for

all i. This implies that either yi = wi or yi ↔ wi ∈ B. Now define a systemof half-treks Π : Y ⇒ P by setting πi = ψi if wi = yi, and extending ψi at theleft-hand side to

πi = yi ↔ wi → . . .→ pi

if yi 6= wi. Since Ψ has no sided intersection, Π also has no sided intersection. Itfollows that Y satisfies the half-trek criterion with respect to v.

9. Proofs for the weak half-trek criterion

Lemma 1. Suppose the set W ⊂ V satisfies the weak half-trek criterion with respectto some node v. Then there exists a set Y satisfying the half-trek criterion withrespect to v, such that Y ∩H(v) = W ∩H(v).

Proof. Let Π : W ⇒ P (v) be a system of treks satisfying the conditions of the weakhalf-trek criterion. Let r be the number of treks in Π which are not half-treks, andsuppose r > 0. Using induction, it suffices to show that there is a set W ′ satisfyingthe weak half-trek criterion with respect to v via some trek system Π′ with no morethan r − 1 treks that are not half-treks, and for which W ′ ∩H(v) = W ∩H(v).

Page 23: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

IDENTIFIABILITY OF LINEAR STRUCTURAL EQUATION MODELS 23

Take any w ∈ W for which the trek π ∈ Π with source w is not a half-trek.By the definition of the weak half-trek criterion, this implies that w 6∈ H(v). Letw′ 6= w be the (unique) node in the left-hand side of π that is closest to the targetof π, which we denote t(π). The trek π has the structure

w ← · · · ← w′ · · · t(π).

Let π′ be the subtrek from w′ to t(π). Then π′ is a half-trek. Since w is a descendentof w′ and w 6∈ H(v), this implies w′ 6∈ H(v) and w 6∈ S(v). Furthermore, w′ 6∈W\w because π has no sided intersection and w′ is in the left-hand side of π.

Define W ′ = (W \ w) ∪ w′ and Π′ = (Π \ π) ∪ π′. Since the originalsystem of treks Π had no sided intersection, the new system of treks Π′ also has nosided intersection. Precisely r − 1 of the treks in Π′ are not half-treks. Moreover,since w,w′ 6∈ H(v), it holds that W ′∩H(v) = W∩H(v), as needed to be shown.

Theorem 3 (Weak HTC). Theorems 1 and 2 hold when using the weak half-trekcriterion instead of the half-trek criterion. Moreover, a graph G can be provedto be rationally identifiable (or generically infinite-to-one) using the weak half-trekcriterion if and only if G is HTC-identifiable (or HTC-infinite-to-one).

Proof. It is sufficient to show the following two facts:

(a) If G can be proved to be rationally identifiable using the weak half-trekcriterion, then G is HTC-identifiable.

(b) If G cannot be proved to be generically infinite-to-one using the weak half-trek criterion, then G is not HTC-infinite-to-one.

Part (a). A graph G can be proved to be rationally identifiable using the weakhalf-trek criterion if there is, for each v, a set of nodes Wv satisfying the weakhalf-trek criterion with respect to v, and an ordering ≺ such that w ≺ v for anyw ∈Wv∩H(v). By Lemma 1 below, for each v, there is then also a set Yv satisfyingthe half-trek criterion with respect to v, with Yv ∩H(v) = Wv ∩H(v). Therefore,G is seen to be HTC-identifiable using the same ordering ≺.

Part (b). If G cannot be proved to be generically infinite-to-one using the weakhalf-trek criterion, then there is a family (Wv : v ∈ V ), such that each Wv satisfiesthe weak half-trek criterion with respect to v, and v ∈Ww implies w 6∈Wv. UsingLemma 1, we can find, for each v, a set Yv that satisfies the half-trek criterionwith respect to v and for which Yv ∩ H(v) = Wv ∩ H(v). Now suppose v ∈ Ywfor two nodes v, w ∈ V . This means that v 6∈ S(w) ∪ w and there is a half-trek π with source v and target w, which implies that w ∈ H(v). If also w ∈ Yv,then w ∈ Yv ∩ H(v) = Wv ∩ H(v). By symmetry, we also get v ∈ Ww. Thiscontradicts our assumption, and so w 6∈ Yv. This proves that G cannot be provedto be generically infinite-to-one using the half-trek criterion.

10. Proofs for half-trek versus G-criterion

In this section, we assume that G is an acyclic mixed graph whose vertex set V =[m] is enumerated according to some topological ordering under which Theorem 4applies, making the graph GC-identifiable. Let Av be the sets from Theorem 4.Recall Definition 6, for each node v ∈ V , let Yv ∪ Zv = Av be the partition that,together with the systems of treks Πv : Yv ⇒ P (v) and Ψv : Zv ⇒ S<(v), witnessesthat Av satisfies the G-criterion with respect to v. For each v, for each π ∈ Πv (and

Page 24: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

24 RINA FOYGEL, JAN DRAISMA, AND MATHIAS DRTON

each ψ ∈ Ψv), define π′ (or ψ′) by extending π (or ψ) with the edge t(π) → v (ort(ψ)↔ v), as in Definition 6.

Lemma 6. Consider any node v ∈ V . If w ∈ Left (π) for some trek π ∈ Πv, thenw 6= v and w 6∈ S(v).

Proof. Let y and t(π) be the source and the target of π, respectively.First, suppose that w < v. If w ∈ S(v), then there is a trek ψ ∈ Ψv with source

z ∈ Zv and target w that extends to a trek ψ′ when appending the edge w ↔ vto the right-hand side. Since there is a sided intersection between ψ′ and π′, wecannot have w ∈ S<(v).

Next, suppose that w ≥ v, and condition (C1) of Theorem 4 is satisfied. If w = v,then Depth(y) ≥ Depth(w) = Depth(v) gives a contradiction to (C1). If insteadw > v, then Depth(w) ≤ Depth(y) < Depth(v), by (C1). Suppose w ∈ S(v), andconsider Aw. Since v ∈ S<(w), there is a trek ψ′ of the form z · · · v ↔ w withsource z ∈ Zw. But then Depth(z) ≥ Depth(v) > Depth(w), which contradicts(C1). Hence, we cannot have w = v or w ∈ S>(v) if condition (C1) is true.

Next, suppose that w = v, and condition (C2) of Theorem 4 is satisfied. If π is ahalf-trek, then v = w = y ∈ Av, a contradiction. If w = v and π is not a half-trek,then y is a proper descendent of w = v, and so y ∈ H(v). But then (C2) requiresthat π be a half-trek, a contradiction. Therefore, w 6= v if condition (C2) is true.

Finally, suppose that w > v, and condition (C2) of Theorem 4 is satisfied. If w ∈S(v), then y = w or y is a proper descendent of w. In either case, y ∈ H(v)∪S>(v),and so π must be a half-trek. It follows that π has source node w = y, which impliesthat w ≺ v in the ordering specified by condition (C2). We now consider Aw. Sincev ∈ S<(w), there is a trek ψ′ of the form z · · · v ↔ w with source z ∈ Zw. Thenv ∈ H(w) because of the half-trek π′ from w to v. Moreover, either z = v or zis a proper descendent of v. Therefore, z ∈ H(w), and so ψ′ must be a half-trek,implying that z = v. It follows that v ∈ Aw ∩H(w), and so v ≺ w in the orderingspecified by condition (C2). This is a contradiction. Therefore, we cannot havew ∈ S>(v) if condition (C2) is true.

We now prove the theorem.

Theorem 5. A GC-identifiable acyclic mixed graph is also HTC-identifiable.

Proof. First, consider the case that condition (C1) of Theorem 4 holds. For eachv, we can uniquely decompose each trek π ∈ Πv as

y(π)← · · · ← y∗(π) · · · t(π),

where y(π) ∈ Yv is the source and t(π) ∈ P (v) the target of π, and the subtrek π∗from y∗(π) to t(π) is a half-trek. By Lemma 6, y∗ 6= v and y∗ 6∈ S(v). Furthermore,for two distinct treks π1, π2 ∈ Πv, we must have y∗(π1) 6= y∗(π2), because otherwisethere would be a sided intersection between the extensions π′1 and π′2 of π1 and π2,respectively. Now define Y ∗v = y∗(π) : π ∈ Πv. Using the system of half-treks Φv = π∗ : π ∈ Πv, we see that Y ∗v satisfies the half-trek criterion withrespect to v, for each v. Finally, define a total ordering ≺ on V that agrees withthe partial ordering induced by depth. Observe that for all v, w, it holds thatDepth (y∗(π)) ≤ Depth (y(π)) < Depth(v), by condition (C1). Hence, for anyy ∈ Y ∗v ∩H(v), we must have Depth (y) < Depth(v), and so y ≺ v. Consequently,the conditions of Theorem 1 are satisfied, and the graph G is HTC-identifiable.

Page 25: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

IDENTIFIABILITY OF LINEAR STRUCTURAL EQUATION MODELS 25

Next, consider the case that condition (C2) of Theorem 4 holds. For each v, byLemma 6, Yv is disjoint from S(v) ∪ v. By the G-criterion, the system of treksΠv : Yv ⇒ P (v) has no sided intersection. By condition (C2), a trek π ∈ Πv is ahalf-trek whenever the source y(π) ∈ Yv ∩H(v). Therefore, the set Yv satisfies theweak half-trek criterion with respect to v. Finally, take the ordering ≺ specifiedby condition (C2). For each y ∈ Yv ∩ H(v), we must have w ≺ v for all w ∈ Yv,by condition (C2). Therefore, using the weak half-trek method in Theorem 3, thegraph G is seen to be HTC-identifiable.

Proposition 3. The acyclic mixed graph in Figure 2 is not GC-identifiable.

Proof of Proposition 3. First, note that the sibling sets S<(v) are unique becausethis graph G has a unique topological ordering. Next, observe that with both1 → 2 and 1 ↔ 2 in the graph, |P (2)| + |S<(2)| = 2. But only node 1 has depthsmaller than node 2. Therefore, G cannot be GC-identifiable via condition (C1) ofTheorem 4, and it remains to consider condition (C2).

Node 2: We have S<(2) = 1 and must therefore find a set Z2 = v suchthat there exists a trek π of the form v ← · · · ← 1↔ 2. If v ∈ 3, 4, 5, thenv ∈ Z2∩H(2), and π would need to be a half-trek, which is a contradiction.We conclude that Z2 = 1, implying 1 6∈ Y2. The parent set of node 2 isP (2) = 1, and we must find Y2 = v for a node v ∈ 3, 4, 5 that is thesource of a trek π of the form v · · · 1 → 2. Since 3, 4, 5 ⊂ H(2), the trekπ must be a half-trek. This restricts the choice to v ∈ 4, 5. Therefore,either 4 ∈ Y2 ∩H(2) or 5 ∈ Y2 ∩H(2). Hence, either 4 ≺ 2 or 5 ≺ 2.

Node 4: Starting from 3 ∈ S<(4) and reasoning as for Z2 before, we musthave 3 ∈ Z4 and, consequently, 3 6∈ Y4. Since P (4) = 3, we must haveY4 = v with a trek π of the form v · · · 3 → 4. The set Z4 must containa node w at the source of a trek w · · · 1 ↔ 4. Hence, v = 1 cannot bein Y4 since a sided intersection in the system of treks would be created.Therefore, we must have v ∈ 2, 5 ⊂ H(4), and so. It follows that either2 ≺ 4 or 5 ≺ 4.

Node 5: We have 4 ∈ S<(5). By the same reasoning as for Z2 and Z4, it holdsthat 4 ∈ Z5. Since 4 ∈ H(5), this means that 4 ≺ 5.

We conclude that a total ordering ≺ as required for GC-identifiability would haveto satisfy either 4 ≺ 5 ≺ 4, or 2 ≺ 4 ≺ 2, or 2 ≺ 4 ≺ 5 ≺ 2. Consequently, no suchordering exists.

11. Proofs for graph decomposition

Lemma 7. Let v be a node in the mixed component G′ of an acyclic mixed graph G.Consider the set H(v) in G, and let H ′(v) be the analogue in G′. If there is a set Ythat satisfies the half-trek criterion with respect to v in G, then there is a set Y ′ thatsatisfies the half-trek criterion with respect to v in G′, and Y ′ ∩H ′(v) ⊂ Y ∩H(v).

Proof. Let V ′ be the vertex set of G′, and let C ′ ⊂ V ′ be the vertex set of thebidirected connected component of G that defined G′. We may assume that v ∈ C ′,for otherwise v has no parents in G′ and the claims concern empty sets. Choose asystem of half-treks Π : Y ⇒ P (v) with no sided intersection and with Y ∩ (v ∩S(v)) = ∅. Since P (v) ⊂ V ′, each half-trek π ∈ Π eventually visits only nodesin V ′. Now take Π′ to be the set of half-treks obtained by retaining the longest

Page 26: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

26 RINA FOYGEL, JAN DRAISMA, AND MATHIAS DRTON

subtrek of each half-trek π ∈ Π that remains entirely in G′ and contains the targetof π. If π′ ∈ Π′ is derived from π ∈ Π, then either (i) π′ = π or (ii) π′ is a directedpath and its source y′ is an element of V ′ \ C ′.

First, we claim that Π′ is a system of half-treks. In other words, we claim thatthe sources y′1 and y′2 of two distinct half-treks π′1, π

′2 ∈ Π′ satisfy y′1 6= y′2. Let

π1 and π2 be the half-treks in Π that yielded π′1 and π′2, respectively. Since Π iswithout sided intersection, y′1 6= y′2 if both π′1 = π1 and π′2 = π2. If, without lossof generality, π′1 6= π1, then y′1 ∈ Right (π1), and y′1 6∈ C ′. Now suppose y′1 = y′2.Since Π has no sided intersection, we must have y′2 6∈ Right (π2). This implies thatπ′2 starts with a bidirected edge and, thus, π′2 = π2, and therefore y′2 ∈ C ′, whiley′1 6∈ C ′. Consequently, y′1 6= y′2.

Second, we claim that Π′ has no sided intersections. Consider any π1, π2 ∈ Π.Since π′1 and π′2 are half-treks, Left (π′1) = y′1 and Left (π′2) = y′2. Above, weshowed that y′1 6= y′2, and therefore Left (π′1) ∩ Left (π′2) = ∅. Next we consider theright-hand sides. By definition of π′1 and π′2, we have Right (π′1) ⊆ Right (π1) andRight (π′2) ⊆ Right (π2). Therefore

Right (π′1) ∩ Right (π′2) ⊆ Right (π1) ∩ Right (π2) = ∅ .Third, we claim that Π′ satisfies the half-trek criterion with respect to v in the

component G′. For this it remains to show that no source in Y ′ is equal to v or asibling of v. Indeed if the source y′ of a half-trek π′ ∈ Π′ is in S(v) ∪ v ⊂ C ′,then π′ = π and we have a contradiction to Y ∩ (v ∪ S(v)) = ∅.

Finally, we claim that Y ′ ∩H ′(v) ⊂ Y ∩H(v). Since G′ is a subgraph of G, wehave H ′(v) ⊂ H(v). Our claim thus holds because all nodes in Y ′ \Y are in V ′ \C ′and there are no directed edges pointing to nodes in V ′ \ C ′ in the graph G′.

Lemma 8. Let G1, . . . , Gk be the mixed components of an acyclic mixed graph G.Suppose that G1, . . . , Gk are not HTC-infinite-to-one. Then G is not HTC-infinite-to-one.

Proof. For each j ∈ [k], choose a family(Y

(j)v : v ∈ Vj

)where each set Y

(j)v ⊂ Vj

satisfies the half-trek criterion with respect to v in the component Gj , and for all

v, w ∈ Vj , either v 6∈ Y (j)w or w 6∈ Y (j)

v . Such a family exists by the assumption.

For v ∈ V , let j(v) be the unique index in [k] with v ∈ Cj(v). Define Yv = Y(j(v))v .

The original graph G is seen not to be HTC-infinite-to-one, if the following twoclaims are proven:

(a) in G, each set Yv satisfies the half-trek criterion with respect to its indexingnode v;

(b) for each v 6= w ∈ V , either v 6∈ Yw or w 6∈ Yv.Proof of claim (a): Fix any v and abbreviate j = j(v). By definition, Yv = Y

(j(v))v

satisfies the half-trek criterion with respect to v in Gj . This implies that there isa system Π of half-treks with no sided intersection from Yv to P (v) ∩ Vj , and thatYv ⊂ Vj\ (v ∪ (S(v) ∩ Vj)). However, by definition of Gj , we know P (v) ⊆ Vj andS(v) ⊆ Vj . Hence, Yv satisfies the half-trek criterion with respect to v, in G.

Proof of claim (b): Fix any two nodes v 6= w. If j(v) = j(w), then by assumption,

either v 6∈ Y (j(w))w = Yw or w 6∈ Y (j(v)

v = Yv. Now suppose j(v) 6= j(w), and w ∈ Yv.Then w ∈ Vj(v)\Cj(v), implying that there exists a directed path from w to v inGj(v). Similarly, if v ∈ Yw, then there is a directed path from v to w in Gj(w).Since the directed part of G is acyclic, this is a contradiction.

Page 27: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

IDENTIFIABILITY OF LINEAR STRUCTURAL EQUATION MODELS 27

Lemma 9. Suppose G is not HTC-infinite-to-one. Then Gj is not HTC-infinite-to-one for all j ∈ [k].

Proof. If G is not HTC-infinite-to-one, then there exists a family (Yv : v ∈ V ) ofsubsets of V , such that for each v, Yv satisfies the half-trek criterion with respectto v, and for all v 6= w, either v 6∈ Yw or w 6∈ Yv.

Now fix any j ∈ [k]. For each v ∈ Vj , we adopt the construction from the proofof Lemma 7 to obtain a system of half-treks Π′v : Y ′v ⇒ P (v) in Gj that shows thatY ′v satisfies the half-trek criterion with respect to v in Gj . For each v ∈ Vj \ Cj , vhas no parents in G′, and so we can define Y ′v = ∅.

Now consider any v, w ∈ Vj . Suppose for a contradiction that v ∈ Y ′w andw ∈ Y ′v . This implies Y ′v , Y

′w 6= ∅, and so v, w ∈ Cj . In this case, v is the source of

some half-trek in Π′w. But then v ∈ Cj implies that this half-trek was unchangedwhen constructing Π′w, and thus v is also in Yw. The same argument shows thatw ∈ Yv, which contradicts the assumption made for our claim.

12. Conclusion

We have proposed graphical criteria for determining identifiability as well as non-identifiability of linear structural equation models. To our knowledge, our criteriaare the best known. They apply to cyclic mixed graphs and, for acyclic graphs, thegraph decomposition method discussed in Section 7 further extends their scope. Itwould be interesting to determine whether a similar graph decomposition methodcan be applied to cyclic graphs as well. Additionally, to better understand the“gap” between the necessary condition and the sufficient condition for rationalidentifiability that we have developed, we would also like to find some class ofgraphs, defined on an arbitrary number of nodes m, which is rationally identifiablebut not HTC-identifiable.

In models that are not HTC-identifiable, the half-trek method can still prove cer-tain parameters to be rationally identifiable; recall, for instance, the example fromFigure 5(a). Referring to Theorem 1, if a set Yv satisfies the half-trek criterion withrespect to the indexing node v, and Yv ∩ H(v) = ∅, then the proof of Theorem 1shows how to obtain rational expressions in the covariance matrix Σ that equal thecoefficients λwv, where w ∈ P (v). In the next step of the recursive procedure thatproves Theorem 1, we can solve for any node u with Yu∩H(u) ⊆ v. Continuing inthis way, individual parameters can be identified even though ultimately the proce-dure will stop before all nodes are visited as we are discussing an HTC-inconclusivegraph. It would be interesting to compare this partial application of the half-trekmethod to other graphical criteria for identification of individual edge coefficients;see in particular [GPSS10] for a review and examples of such methods.

Applying our main results, Theorems 1 and 2, requires one to find sets that satisfythe half-trek criterion with respect to a considered node. In the related context ofthe G-criterion, Chapter 4 in the Ph.D. thesis [Bri04] formulates this problem asa computation of maximum flow in a network. Revisiting this construction in thecontext of our half-trek criterion would be useful for the treatment of larger graphsand an efficient computer implementation of the methods from this paper.

Page 28: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

28 RINA FOYGEL, JAN DRAISMA, AND MATHIAS DRTON

Appendix A. Algebraic techniques for proving and disprovingidentifibability

In the proofs of our half-trek criteria we have made extensive use of the equa-tions (8.2), namely,[

(I − Λ)TΣ(I − Λ)]vw

= 0 ∀(v, w) 6∈ B and v 6= w.(A.1)

In this appendix we discuss the algebro-geometric and computational content ofthese equations. In what follows, we write λ and ω for tuples of variables repre-senting the non-zero entries of Λ and Ω, and σ for a tuple of variables representingthe entries of Σ. We will also need the rational function δ := det(I − Λ)−1; in theacyclic case, δ is just the constant 1. The entries of the inverse (I −Λ)−1 lie in thering R[λ, δ] of polynomials in λ and δ. The first observation is the following.

Proposition 5. The left-hand sides of the equations (A.1) generate a radical idealin the ring R[λ, σ, δ].

We apply the theory of elimination ideals; see [CLO07, Chapter 3].

Proof. Recall the parametrization equation (8.1), namely,

(A.2) Σ− (I − Λ)−TΩ(I − Λ)−1 = 0,

and interpret the individual entries of the matrix on the left-hand side as generatorsof an ideal J in R[λ, σ, ω, δ]. The ideal J is radical because it is the ideal of thegraph of the parametrization φG. Let I be the ideal generated by the left-hand sidesof (A.1). We claim that I = J ∩ R[λ, σ, δ]; the fact that J is radical then impliesthat I is radical as well.

The containment I ⊆ J ∩ R[λ, σ, δ] is immediate because J contains the entriesof the matrix obtained by multiplying (A.2) from the left by (I−Λ)T and from theright by (I − Λ), and among these entries are the generators of I.

For the converse, let f ∈ J ∩ R[λ, σ, δ]. Let Ω be a symmetric matrix full ofauxiliary new variables ωij = ωji, even at positions that do not correspond tobidirected edges, and let g ∈ R[λ, ω, δ] be the polynomial obtained from f bysubstituting the entries of

(I − Λ)−T Ω(I − Λ)−1

for σ. Since f lies in J , the polynomial g becomes zero when setting all variablesωuv with u 6= v, (u, v) 6∈ B equal to zero. This means that g lies in the idealgenerated by these variables, i.e.,

g =∑

u 6=v,(u,v)6∈B

hi · ωuv

for suitable coefficients hi ∈ R[λ, ω, δ]. Re-substituting (I − Λ)TΣ(I − Λ) for Ωon the left-hand side yields f , and performing the same substitution on the rightyields an R[λ, σ, δ]-linear combination of the generators of I. It follows that I =J ∩ R[λ, σ, δ].

We continue to write I and J for the two ideals featuring in the proof just given.In more geometric language, the proposition and its proof show that I is the idealof all polynomials vanishing identically on the projection of the graph of φG intothe principal open subset of (λ, σ)-space where δ is defined.

Page 29: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

IDENTIFIABILITY OF LINEAR STRUCTURAL EQUATION MODELS 29

Lemma 10. The parameter λuv is rationally identifiable if and only if I containsan element of the form a(σ)λuv − b(σ) with a, b ∈ R[σ] and a not identically zeroon the linear structural equation model given by the graph G.

In fact, in this case b will not be identically zero on the model, either.

Proof. By definition, if λuv is rationally identifiable, then there is a rational functionb(σ)/a(σ) ∈ R(σ) which upon substituting for σ the entries of (I − Λ)−TΩ(I − Λ)becomes equal to λuv; in particular, this substitution must be well-defined, so thata(σ) does not vanish identically on the model. This means that the polynomiala(σ)λuv − b(σ) lies in the ideal J of the graph of the parametrization. Since it onlydepends on λ and σ, it lies in I (see the proof of Proposition 5). Conversely, if a, bare as in the lemma, then b/a is a rational function identifying λuv from σ.

Lemma 10 yields an algorithm for checking rational identifiability of a graph Gthat is very close to that of [GPSS10], the main difference being that we use theequations in (A.1) rather than those in (A.2).

Algorithm 1 Check rational identifiability

(1) Make a list S containing all matrix entries of the left-hand side of (A.1),together with the additional polynomial δ · det(I − Λ) − 1, in which δ istreated as a variable.

(2) Choose a block monomial order ≥ on the monomials in the variables λ, σ, δwith δ > λ > σ; that is, when comparing two monomials, first compare theexponents of δ, and in case of a tie compare the λ-parts of the monomials,and in case of a tie compare the σ-parts.

(3) Compute a reduced Grobner basis T with respect to ≥ of the ideal I gen-erated by S.

(4) Then G is rationally identifiable if and only if for each (u, v) ∈ D the basisT contains an element whose leading monomial equals a monomial in σtimes λuv.

Correctness of the algorithm. If T contains a polynomial fuv whose leading mono-mial equals λuv times a monomial in σ, then fuv is of the form a(σ)λuv − b(σ, λ),where b only contains λ-variables smaller than λuv. Moreover, a does not vanishidentically on the model (or else a would be in I and hence fuv would not be re-duced). Therefore, λuv can be rationally identified if all smaller λ-variables can.Hence, if we assume that T contains such a polynomial for all (u, v) ∈ D, then Gis rationally identifiable.

Conversely, if λuv is rationally identified by b(σ)/a(σ), then a(σ)λuv − b(σ) ∈I by Lemma 10. Replace a by its reduction modulo T ; this reduction is non-zero since a does not vanish identically on the model, and it contains only thevariables σ because of the choice of monomial order. Now the leading monomial ofa(σ)λuv − b(σ) equals λuv times the leading monomial of a, and it is divisible bythe leading monomial of some element f of T . Then f has leading monomial λuvtimes some monomial in σ, as required.

The reduced Grobner basis T contains more information than is used in Step (4)of the algorithm just described. Indeed, straightforward modifications of Step (4)

Page 30: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

30 RINA FOYGEL, JAN DRAISMA, AND MATHIAS DRTON

1

2

4

5

3

Figure 8. Cyclic graph exemplifying the need to remove points(Λ,Ω) with det(I − Λ) = 0 in algebraic computations.

can be used to test whether the parametrization is generically finite-to-one, and tofind the degree of identifiability ID(G).

For large-scale computations such as those in Section 6, the presented algorithmis too involved. Instead, we used a randomized version in which the variables σare replaced by the numerical values of the entries of randomly chosen matrices inthe model. In other words, for random choices of Λ0 ∈ RDreg and Ω0 ∈ PD(B), wecompute a reduced Grobner basis for the equation system[

(I − Λ)TφG(Λ0,Ω0)(I − Λ)]vw

= 0 ∀(v, w) 6∈ B and v 6= w,(A.3)

δ · det(I − Λ)− 1 = 0,(A.4)

under a block monomial order with δ > λ. The reduced Grobner basis informsus about the dimension and cardinality of the solution set of (A.3) and (A.4) overCDreg × Cm×msym , and readily yields the degree of identifiability ID(G). In particular,the basis corresponds to a linear equation system with unique solution if and only ifthe graph is rationally identifiable. Formally, the claims in the last sentences holdwith probability one, if (Λ0,Ω0) is drawn from a continuous probability distribution.In practice, we generate random integer-valued matrices that are then processed in acomputer algebra system such as Singular [DGPS11]. To guard against occasionalfalse conclusions from random draws that yield matrices in special position, werepeat the randomized calculation several times for each graph.

Finally, we stress with our last example that the equation (A.4) cannot be omit-ted when studying cyclic graphs, even when Λ0 is chosen to be in RDreg.

Example 9. For the graph G in Figure 8, a run of Algorithm 1 without specializingvalues shows that the ideal in R[λ, σ] generated by the equations (A.1) containselements a12λ12 + b12, a14λ14 + b14, a23λ23 + b23 with the aij , bij polynomials in R[σ]not vanishing identically on the model, but it does not contain similar elementsa41λ41 + b41 or a31λ32 + b32. Furthermore, running a Grobner basis computationon the fiber equations (A.3) with randomly specialized values yields that the fiber hasmultiplicity 3, and hence the ideal generated by these equations is not radical. Bothof these issues disappear when introducing the auxiliary variable δ and imposingδ · det(I − Λ) − 1 = 0, with Σ specialized in the second case. Then the algorithmproves that G is rationally identifiable. In fact, G is HTC-identifiable, becauseTheorem 1 applies with the ordering 2 ≺ 3 ≺ 1 ≺ 4 ≺ 5 and the sets

Y2 = 1, 4, Y3 = 2, Y1 = 3, Y4 = 1, Y5 = ∅.

Page 31: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

IDENTIFIABILITY OF LINEAR STRUCTURAL EQUATION MODELS 31

Acknowledgments

This collaboration was started at a workshop at the American Institute of Mathe-matics. We are grateful to Ilya Shpitser and Jin Tian for helpful comments about ex-isting literature. Mathias Drton was supported by the NSF under Grant No. DMS-0746265 and by an Alfred P. Sloan Fellowship. Jan Draisma was supported by aVidi grant from The Netherlands Organisation for Scientific Research (NWO).

References

[Bol89] Kenneth A. Bollen, Structural equations with latent variables, Wiley Series in Proba-bility and Mathematical Statistics: Applied Probability and Statistics, John Wiley &

Sons Inc., New York, 1989, A Wiley-Interscience Publication. MR996025 (90k:62001)[BP02a] Carlos Brito and Judea Pearl, A graphical criterion for the identification of causal ef-

fects in linear models, Proceedings of the National Conference on Artificial Intelligence

(AAAI), 2002, pp. 533–538.[BP02b] , A new identification condition for recursive models with correlated errors,

Struct. Equ. Model. 9 (2002), no. 4, 459–474. MR1930449

[BP06] , Graphical condition for identification in recursive SEM, Proceedings of theTwenty-Second Conference on Uncertainty in Artificial Intelligence (Rina Dechter and

Thomas S. Richardson, eds.), AUAI Press, 2006, pp. 47–54.

[Bri04] Carlos Brito, Graphical methods for identification in structural equation models, Ph.D.thesis, UCLA Computer Science Department, 2004.

[CK10] Hei Chan and Manabu Kuroki, Using descendants as instrumental variables for the

identification of direct causal effects in linear SEMs, Proceedings of the ThirteenthInternational Conference on Artificial Intelligence and Statistics (Yee Whye Teh and

Mike Titterington, eds.), J. Mach. Learn. Res. (JMLR), Workshop and ConferenceProceedings, vol. 9, 2010, pp. 73–80.

[CLO07] David Cox, John Little, and Donal O’Shea, Ideals, varieties, and algorithms, third

ed., Undergraduate Texts in Mathematics, Springer, New York, 2007, An introduc-tion to computational algebraic geometry and commutative algebra. MR2290010

(2007h:13036)

[DFS11] Mathias Drton, Rina Foygel, and Seth Sullivant, Global identifiability of linear struc-tural equation models, Ann. Statist. 39 (2011), no. 2, 865–886.

[DGPS11] Wolfram Decker, Gert-Martin Greuel, Gerhard Pfister, and Hans Schonemann, Sin-

gular 3-1-3 — A computer algebra system for polynomial computations, 2011,http://www.singular.uni-kl.de.

[DMS10] Vanessa Didelez, Sha Meng, and Nuala A. Sheehan, Assumptions of IV methods

for observational epidemiology, Statist. Sci. 25 (2010), no. 1, 22–40. MR2290010(2007h:13036)

[ER99] William N. Evans and Jeanne S. Ringel, Can higher cigarette taxes improve birth

outcomes?, Journal of Public Economics 72 (1999), no. 1, 135–154.[GPSS10] Luis D. Garcia-Puente, Sarah Spielvogel, and Seth Sullivant, Identifying causal effects

with computer algebra, Proceedings of the 26th Conference on Uncertainty in ArtificialIntelligence (UAI) (Peter Grunwald and Peter Spirtes, eds.), AUAI Press, 2010.

[Oka73] Masashi Okamoto, Distinctness of the eigenvalues of a quadratic form in a multivariate

sample, Ann. Statist. 1 (1973), 763–765. MR0331643 (48 #9975)[Pea00] Judea Pearl, Causality, Cambridge University Press, Cambridge, 2000.

[R D11] R Development Core Team, R: A language and environment for statistical computing,R Foundation for Statistical Computing, Vienna, Austria, 2011, ISBN 3-900051-07-0.

[RS02] Thomas Richardson and Peter Spirtes, Ancestral graph Markov models, Ann. Statist.30 (2002), no. 4, 962–1030. MR1926166 (2003h:60017)

[Sch04] Alexander Schrijver, Combinatorial optimization. Polyhedra and efficiency, Algo-rithms and Combinatorics 24, vol. A, Springer, Berlin, 2004.

[SGS00] Peter Spirtes, Clark Glymour, and Richard Scheines, Causation, prediction, and

search, second ed., Adaptive Computation and Machine Learning, MIT Press, Cam-bridge, MA, 2000. MR1815675 (2001j:62009)

Page 32: arXiv:1107.5552v1 [math.ST] 27 Jul 2011 - CiteSeerX

32 RINA FOYGEL, JAN DRAISMA, AND MATHIAS DRTON

[STD10] Seth Sullivant, Kelli Talaska, and Jan Draisma, Trek separation for Gaussian graphical

models, Ann. Statist. 38 (2010), no. 3, 1665–1685. MR1815675 (2001j:62009)

[Tia05] Jin Tian, Identifying direct causal effects in linear models, Proceedings of the NationalConference on Artificial Intelligence (AAAI), 2005, pp. 346–353.

[Tia09] , Parameter identification in a class of linear structural equation models, Pro-

ceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2009,pp. 1970–1975.

[Wer11] Nanny Wermuth, Probability distributions with summary graph structure, Bernoulli 17

(2011), no. 3, 845–879.[Wri21] Sewall Wright, Correlation and causation, J. Agricultural Research 20 (1921), 557–585.

[Wri34] , The method of path coefficients, Ann. Math. Statist. 5 (1934), no. 3, 161–215.

Department of Statistics, The University of Chicago, Chicago, IL, U.S.A.E-mail address: [email protected]

Department of Mathematics and Computer Science, Eindhoven University of Tech-

nology, Eindhoven, The Netherlands; and Centrum voor Wiskunde en Informatica, Am-sterdam, The Netherlands

E-mail address: [email protected]

Department of Statistics, The University of Chicago, Chicago, IL, U.S.A.

E-mail address: [email protected]