Top Banner
Open-World Probabilistic Databases Guy Van den Broeck GCAI Oct 21, 2017
175

Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Aug 31, 2019

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Open-World

Probabilistic Databases

Guy Van den Broeck

GCAI

Oct 21, 2017

Page 2: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Overview

1. Why probabilistic databases?

2. How probabilistic query evaluation?

3. Why open world?

4. How open-world query evaluation?

5. What is the broader picture? First-order model counting!

Page 3: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Why probabilistic databases?

Page 4: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

What we’d like to do…

Page 5: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

What we’d like to do…

Page 6: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Google Knowledge Graph

> 570 million entities > 18 billion tuples

Page 7: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

• Tuple-independent probabilistic database

• Learned from the web, large text corpora, ontologies,

etc., using statistical machine learning.

Co

au

tho

r

Probabilistic Databases

x y P

Erdos Renyi 0.6

Einstein Pauli 0.7

Obama Erdos 0.1

Scie

nti

st x P

Erdos 0.9

Einstein 0.8

Pauli 0.6

[VdB&Suciu’17]

Page 8: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Information Extraction is Noisy!

x y P

Luc Laura 0.7

Luc Hendrik 0.6

Luc Kathleen 0.3

Luc Paol 0.3

Luc Paolo 0.1

Coauthor

Page 9: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Information Extraction is Noisy!

x y P

Luc Laura 0.7

Luc Hendrik 0.6

Luc Kathleen 0.3

Luc Paol 0.3

Luc Paolo 0.1

Coauthor

Page 10: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Information Extraction is Noisy!

x y P

Luc Laura 0.7

Luc Hendrik 0.6

Luc Kathleen 0.3

Luc Paol 0.3

Luc Paolo 0.1

Coauthor

Page 11: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

What we’d like to do…

∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

Page 12: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Einstein is in the Knowledge Graph

Page 13: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Erdős is in the Knowledge Graph

Page 14: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

This guy is in the Knowledge Graph

Page 15: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

This guy is in the Knowledge Graph

… and he published with both Einstein and Erdos!

Page 16: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Desired Query Answer

Ernst Straus

Barack Obama, …

Justin Bieber, …

Page 17: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Desired Query Answer

Ernst Straus

Barack Obama, …

Justin Bieber, …

1. Fuse uncertain

information from web

⇒ Embrace probability!

2. Cannot come from

labeled data

⇒ Embrace query eval!

Page 18: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

[Chen’16+ (NYTimes)

Page 19: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

How probabilistic

query evaluation?

Page 20: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Tuple-Independent Probabilistic DB

x y P

A B p1

A C p2

B C p3

Probabilistic database D:

Co

auth

or

*VdB&Suciu’17+

Page 21: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

x y

A B

A C

B C

Tuple-Independent Probabilistic DB

x y P

A B p1

A C p2

B C p3

Possible worlds semantics:

p1p2p3

Probabilistic database D:

Co

auth

or

*VdB&Suciu’17+

Page 22: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

x y

A B

A C

B C

Tuple-Independent Probabilistic DB

x y P

A B p1

A C p2

B C p3

Possible worlds semantics:

p1p2p3

(1-p1)p2p3

Probabilistic database D:

x y

A C

B C C

oau

tho

r

*VdB&Suciu’17+

Page 23: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

x y

A B

A C

B C

Tuple-Independent Probabilistic DB

x y P

A B p1

A C p2

B C p3

Possible worlds semantics:

p1p2p3

(1-p1)p2p3

(1-p1)(1-p2)(1-p3)

Probabilistic database D:

x y

A C

B C

x y

A B

A C

x y

A B

B C

x y

A B x y

A C x y

B C x y

Co

auth

or

*VdB&Suciu’17+

Page 24: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

• Conjunctive queries (CQ)

∃ + ∧ + positive literals

Probabilistic Databases Queries

∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

Page 25: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

• Conjunctive queries (CQ)

∃ + ∧ + positive literals

• Unions of conjunctive queries (UCQ)

v of ∃ + ∧ + positive literals

Probabilistic Databases Queries

∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

Page 26: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

• Conjunctive queries (CQ)

∃ + ∧ + positive literals

• Unions of conjunctive queries (UCQ)

v of ∃ + ∧ + positive literals

• Duality

– Negation of CQ is monotone ∀-clause

– Negation of UCQ is monotone ∀-CNF

Probabilistic Databases Queries

∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

Page 27: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

x y P

A D q1 Y1

A E q2 Y2

B F q3 Y3

B G q4 Y4

B H q5 Y5

x P

A p1 X1

B p2 X2

C p3 X3

P(Q) =

Probabilistic Query Evaluation

Q = ∃x∃y Scientist(x) ∧ Coauthor(x,y)

Scie

nti

st

Co

auth

or

Page 28: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

x y P

A D q1 Y1

A E q2 Y2

B F q3 Y3

B G q4 Y4

B H q5 Y5

x P

A p1 X1

B p2 X2

C p3 X3

P(Q) = 1-(1-q1)*(1-q2)

Probabilistic Query Evaluation

Q = ∃x∃y Scientist(x) ∧ Coauthor(x,y)

Scie

nti

st

Co

auth

or

Page 29: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

x y P

A D q1 Y1

A E q2 Y2

B F q3 Y3

B G q4 Y4

B H q5 Y5

x P

A p1 X1

B p2 X2

C p3 X3

P(Q) = 1-(1-q1)*(1-q2) p1*[ ]

Probabilistic Query Evaluation

Q = ∃x∃y Scientist(x) ∧ Coauthor(x,y)

Scie

nti

st

Co

auth

or

Page 30: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

x y P

A D q1 Y1

A E q2 Y2

B F q3 Y3

B G q4 Y4

B H q5 Y5

x P

A p1 X1

B p2 X2

C p3 X3

P(Q) = 1-(1-q1)*(1-q2) p1*[ ]

1-(1-q3)*(1-q4)*(1-q5)

Probabilistic Query Evaluation

Q = ∃x∃y Scientist(x) ∧ Coauthor(x,y)

Scie

nti

st

Co

auth

or

Page 31: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

x y P

A D q1 Y1

A E q2 Y2

B F q3 Y3

B G q4 Y4

B H q5 Y5

x P

A p1 X1

B p2 X2

C p3 X3

P(Q) = 1-(1-q1)*(1-q2) p1*[ ]

1-(1-q3)*(1-q4)*(1-q5) p2*[ ]

Probabilistic Query Evaluation

Q = ∃x∃y Scientist(x) ∧ Coauthor(x,y)

Scie

nti

st

Co

auth

or

Page 32: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

x y P

A D q1 Y1

A E q2 Y2

B F q3 Y3

B G q4 Y4

B H q5 Y5

x P

A p1 X1

B p2 X2

C p3 X3

P(Q) = 1-(1-q1)*(1-q2) p1*[ ]

1-(1-q3)*(1-q4)*(1-q5) p2*[ ]

1- {1- } *

{1- }

Probabilistic Query Evaluation

Q = ∃x∃y Scientist(x) ∧ Coauthor(x,y)

Scie

nti

st

Co

auth

or

Page 33: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Lifted Inference Rules

Preprocess Q (omitted), Then apply rules (some have preconditions)

*VdB&Suciu’17+

Page 34: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Lifted Inference Rules

Preprocess Q (omitted), Then apply rules (some have preconditions)

P(¬Q) = 1 – P(Q) Negation

*VdB&Suciu’17+

Page 35: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Lifted Inference Rules

P(Q1 ∧ Q2) = P(Q1) P(Q2) P(Q1 ∨ Q2) =1 – (1– P(Q1)) (1–P(Q2))

Preprocess Q (omitted), Then apply rules (some have preconditions)

Decomposable ∧,∨

P(¬Q) = 1 – P(Q) Negation

*VdB&Suciu’17+

Page 36: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Lifted Inference Rules

P(Q1 ∧ Q2) = P(Q1) P(Q2) P(Q1 ∨ Q2) =1 – (1– P(Q1)) (1–P(Q2))

P(∀z Q) = ΠA ∈Domain P(Q[A/z]) P(∃z Q) = 1 – ΠA ∈Domain (1 – P(Q[A/z]))

Preprocess Q (omitted), Then apply rules (some have preconditions)

Decomposable ∧,∨

Decomposable ∃,∀

P(¬Q) = 1 – P(Q) Negation

*VdB&Suciu’17+

Page 37: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Lifted Inference Rules

P(Q1 ∧ Q2) = P(Q1) P(Q2) P(Q1 ∨ Q2) =1 – (1– P(Q1)) (1–P(Q2))

P(∀z Q) = ΠA ∈Domain P(Q[A/z]) P(∃z Q) = 1 – ΠA ∈Domain (1 – P(Q[A/z]))

P(Q1 ∧ Q2) = P(Q1) + P(Q2) - P(Q1 ∨ Q2) P(Q1 ∨ Q2) = P(Q1) + P(Q2) - P(Q1 ∧ Q2)

Preprocess Q (omitted), Then apply rules (some have preconditions)

Decomposable ∧,∨

Decomposable ∃,∀

Inclusion/ exclusion

P(¬Q) = 1 – P(Q) Negation

*VdB&Suciu’17+

Page 38: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Closed-World Lifted Query Eval

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

Page 39: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Closed-World Lifted Query Eval

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

Decomposable ∃-Rule

Page 40: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Closed-World Lifted Query Eval

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

Decomposable ∃-Rule

Check independence:

Scientist(A) ∧ ∃y Coauthor(A,y)

Scientist(B) ∧ ∃y Coauthor(B,y)

Page 41: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Closed-World Lifted Query Eval

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

Decomposable ∃-Rule

Check independence:

Scientist(A) ∧ ∃y Coauthor(A,y)

Scientist(B) ∧ ∃y Coauthor(B,y)

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y))

x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y))

x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y))

x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y))

x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y))

Page 42: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Closed-World Lifted Query Eval

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

Decomposable ∃-Rule

Check independence:

Scientist(A) ∧ ∃y Coauthor(A,y)

Scientist(B) ∧ ∃y Coauthor(B,y)

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y))

x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y))

x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y))

x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y))

x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y))

Complexity PTIME

Page 43: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Limitations

H0 = ∀x∀y Smoker(x) ∨ Friend(x,y) ∨ Jogger(y)

The decomposable ∀-rule:

P(∀z Q) = ΠA ∈Domain P(Q[A/z])

Page 44: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Limitations

H0 = ∀x∀y Smoker(x) ∨ Friend(x,y) ∨ Jogger(y)

The decomposable ∀-rule:

… does not apply:

H0[Alice/x] and H0[Bob/x] are dependent:

∀y (Smoker(Alice) ∨ Friend(Alice,y) ∨ Jogger(y))

∀y (Smoker(Bob) ∨ Friend(Bob,y) ∨ Jogger(y))

Dependent

P(∀z Q) = ΠA ∈Domain P(Q[A/z])

Page 45: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Limitations

H0 = ∀x∀y Smoker(x) ∨ Friend(x,y) ∨ Jogger(y)

The decomposable ∀-rule:

… does not apply:

H0[Alice/x] and H0[Bob/x] are dependent:

∀y (Smoker(Alice) ∨ Friend(Alice,y) ∨ Jogger(y))

∀y (Smoker(Bob) ∨ Friend(Bob,y) ∨ Jogger(y))

Dependent

Lifted inference sometimes fails.

P(∀z Q) = ΠA ∈Domain P(Q[A/z])

Page 46: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Background:

Positive Partitioned 2CNF

1

2

1

2

3

A PP2CNF is:

F = ∧(i,j) ∈ E (xi yj)

where E = the edge set of a bipartite graph

F = (x1 y1) ∧ (x2 y1) ∧ (x2 y3)

∧ (x1 y3) ∧ (x2 y2)

x y

Page 47: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Background:

Positive Partitioned 2CNF

1

2

1

2

3

A PP2CNF is:

F = ∧(i,j) ∈ E (xi yj)

where E = the edge set of a bipartite graph

F = (x1 y1) ∧ (x2 y1) ∧ (x2 y3)

∧ (x1 y3) ∧ (x2 y2)

x y

Theorem: #PP2CNF is #P-hard [Provan’83]

Page 48: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Our Problematic Clause

H0 = ∀x∀y Smoker(x) ∨ Friend(x,y) ∨ Jogger(y)

Page 49: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Our Problematic Clause

Theorem. Computing P(H0) is #P-hard

in the size of the database. [Dalvi&Suciu’04]

H0 = ∀x∀y Smoker(x) ∨ Friend(x,y) ∨ Jogger(y)

Page 50: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Our Problematic Clause

Proof: PP2CNF: F = (Xi1 ∨ Yj1) ∧ (Xi2 ∨ Yj2 ) ∧ … reduce #F to computing P (H0)

By example:

Theorem. Computing P(H0) is #P-hard

in the size of the database. [Dalvi&Suciu’04]

H0 = ∀x∀y Smoker(x) ∨ Friend(x,y) ∨ Jogger(y)

Page 51: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Our Problematic Clause

Proof: PP2CNF: F = (Xi1 ∨ Yj1) ∧ (Xi2 ∨ Yj2 ) ∧ … reduce #F to computing P (H0)

By example:

F = (X1 ∨ Y1) ∧ (X1 ∨ Y2) ∧ (X2 ∨ Y2)

Theorem. Computing P(H0) is #P-hard

in the size of the database. [Dalvi&Suciu’04]

H0 = ∀x∀y Smoker(x) ∨ Friend(x,y) ∨ Jogger(y)

Page 52: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Our Problematic Clause

Proof: PP2CNF: F = (Xi1 ∨ Yj1) ∧ (Xi2 ∨ Yj2 ) ∧ … reduce #F to computing P (H0)

By example:

X Y P

x1 y1 0

x1 y2 0

x2 y2 0

X P

x1 0.5

x2 0.5

Y P

y1 0.5

y2 0.5

Smoker Jogger Friend F = (X1 ∨ Y1) ∧ (X1 ∨ Y2) ∧ (X2 ∨ Y2)

Theorem. Computing P(H0) is #P-hard

in the size of the database. [Dalvi&Suciu’04]

Probabilities (tuples not shown have P=1)

H0 = ∀x∀y Smoker(x) ∨ Friend(x,y) ∨ Jogger(y)

Page 53: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Our Problematic Clause

Proof: PP2CNF: F = (Xi1 ∨ Yj1) ∧ (Xi2 ∨ Yj2 ) ∧ … reduce #F to computing P (H0)

By example:

X Y P

x1 y1 0

x1 y2 0

x2 y2 0

X P

x1 0.5

x2 0.5

Y P

y1 0.5

y2 0.5

Smoker Jogger Friend

P(H0) = P(F); hence P (H0) is #P-hard

F = (X1 ∨ Y1) ∧ (X1 ∨ Y2) ∧ (X2 ∨ Y2)

Theorem. Computing P(H0) is #P-hard

in the size of the database. [Dalvi&Suciu’04]

Probabilities (tuples not shown have P=1)

H0 = ∀x∀y Smoker(x) ∨ Friend(x,y) ∨ Jogger(y)

Page 54: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Are the Lifted Rules Complete?

You already know:

• Inference rules: PTIME data complexity

• Some queries: #P-hard data complexity

[Dalvi and Suciu;JACM’11]

Page 55: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Are the Lifted Rules Complete?

You already know:

• Inference rules: PTIME data complexity

• Some queries: #P-hard data complexity

Dichotomy Theorem for UCQ / Mon. CNF

• If lifted rules succeed, then PTIME query

• If lifted rules fail, then query is #P-hard

[Dalvi and Suciu;JACM’11]

Page 56: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Are the Lifted Rules Complete?

You already know:

• Inference rules: PTIME data complexity

• Some queries: #P-hard data complexity

Dichotomy Theorem for UCQ / Mon. CNF

• If lifted rules succeed, then PTIME query

• If lifted rules fail, then query is #P-hard

Lifted rules are complete for UCQ!

[Dalvi and Suciu;JACM’11]

Page 58: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Why open world?

Page 59: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Knowledge Base Completion

Given:

Learn:

Complete:

0.8::Coauthor(x,y) :- Coauthor(z,x) ∧ Coauthor(z,y).

x y P

Einstein Straus 0.7

Erdos Straus 0.6

Einstein Pauli 0.9

… … …

x y P

Straus Pauli 0.504

… … …

Co

au

tho

r

Page 60: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Bayesian Learning Loop

Bayesian view on learning:

1. Prior belief:

P(Coauthor(Straus,Pauli)) = 0.01

2. Observe page

P(Coauthor(Straus,Pauli| ) = 0.2

3. Observe page

P(Coauthor(Straus,Pauli)| , ) = 0.3

Principled and sound reasoning!

Page 61: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Problem: Broken Learning Loop

Bayesian view on learning:

1. Prior belief:

P(Coauthor(Straus,Pauli)) = 0

2. Observe page

P(Coauthor(Straus,Pauli| ) = 0.2

3. Observe page

P(Coauthor(Straus,Pauli)| , ) = 0.3

[Ceylan, Darwiche, Van den Broeck; KR’16]

Page 62: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Problem: Broken Learning Loop

Bayesian view on learning:

1. Prior belief:

P(Coauthor(Straus,Pauli)) = 0

2. Observe page

P(Coauthor(Straus,Pauli| ) = 0.2

3. Observe page

P(Coauthor(Straus,Pauli)| , ) = 0.3

[Ceylan, Darwiche, Van den Broeck; KR’16]

Page 63: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Problem: Broken Learning Loop

Bayesian view on learning:

1. Prior belief:

P(Coauthor(Straus,Pauli)) = 0

2. Observe page

P(Coauthor(Straus,Pauli| ) = 0.2

3. Observe page

P(Coauthor(Straus,Pauli)| , ) = 0.3

[Ceylan, Darwiche, Van den Broeck; KR’16]

This is mathematical nonsense!

Page 64: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

What we’d like to do…

∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

Ernst Straus

Kristian Kersting, …

Justin Bieber, …

Page 65: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Open World DB

• What if fact missing?

• Probability 0 for:

X Y P

Einstein Straus 0.7

Erdos Straus 0.6

Einstein Pauli 0.9

Erdos Renyi 0.7

Kersting Natarajan 0.8

Luc Paol 0.1

… … …

Coauthor

Q1 = ∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

Page 66: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Open World DB

• What if fact missing?

• Probability 0 for:

X Y P

Einstein Straus 0.7

Erdos Straus 0.6

Einstein Pauli 0.9

Erdos Renyi 0.7

Kersting Natarajan 0.8

Luc Paol 0.1

… … …

Coauthor

Q1 = ∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

Q2 = ∃x Coauthor(Bieber,x) ∧ Coauthor(Erdos,x)

Page 67: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Open World DB

• What if fact missing?

• Probability 0 for:

X Y P

Einstein Straus 0.7

Erdos Straus 0.6

Einstein Pauli 0.9

Erdos Renyi 0.7

Kersting Natarajan 0.8

Luc Paol 0.1

… … …

Coauthor

Q1 = ∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

Q2 = ∃x Coauthor(Bieber,x) ∧ Coauthor(Erdos,x)

Q3 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus)

Page 68: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Open World DB

• What if fact missing?

• Probability 0 for:

X Y P

Einstein Straus 0.7

Erdos Straus 0.6

Einstein Pauli 0.9

Erdos Renyi 0.7

Kersting Natarajan 0.8

Luc Paol 0.1

… … …

Coauthor

Q1 = ∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

Q2 = ∃x Coauthor(Bieber,x) ∧ Coauthor(Erdos,x)

Q3 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus)

Q4 = Coauthor(Einstein,Bieber) ∧ Coauthor(Erdos,Bieber)

Page 69: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Open World DB

• What if fact missing?

• Probability 0 for:

X Y P

Einstein Straus 0.7

Erdos Straus 0.6

Einstein Pauli 0.9

Erdos Renyi 0.7

Kersting Natarajan 0.8

Luc Paol 0.1

… … …

Coauthor

Q1 = ∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

Q2 = ∃x Coauthor(Bieber,x) ∧ Coauthor(Erdos,x)

Q3 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus)

Q4 = Coauthor(Einstein,Bieber) ∧ Coauthor(Erdos,Bieber)

Q5 = Coauthor(Einstein,Bieber) ∧ ¬Coauthor(Einstein,Bieber)

Page 70: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Intuition

X Y P

Einstein Straus 0.7

Erdos Straus 0.6

Einstein Pauli 0.9

Erdos Renyi 0.7

Kersting Natarajan 0.8

Luc Paol 0.1

… … … Q1 = ∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

Q3 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus)

Q4 = Coauthor(Einstein,Bieber) ∧ Coauthor(Erdos,Bieber)

[Ceylan, Darwiche, Van den Broeck; KR’16]

Page 71: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Intuition

X Y P

Einstein Straus 0.7

Erdos Straus 0.6

Einstein Pauli 0.9

Erdos Renyi 0.7

Kersting Natarajan 0.8

Luc Paol 0.1

… … …

We know for sure that P(Q1) ≥ P(Q3), P(Q1) ≥ P(Q4)

Q1 = ∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

Q3 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus)

Q4 = Coauthor(Einstein,Bieber) ∧ Coauthor(Erdos,Bieber)

[Ceylan, Darwiche, Van den Broeck; KR’16]

Page 72: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Intuition

X Y P

Einstein Straus 0.7

Erdos Straus 0.6

Einstein Pauli 0.9

Erdos Renyi 0.7

Kersting Natarajan 0.8

Luc Paol 0.1

… … …

We know for sure that P(Q1) ≥ P(Q3), P(Q1) ≥ P(Q4)

and P(Q3) ≥ P(Q5), P(Q4) ≥ P(Q5)

Q1 = ∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

Q3 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus)

Q4 = Coauthor(Einstein,Bieber) ∧ Coauthor(Erdos,Bieber)

Q5 = Coauthor(Einstein,Bieber) ∧ ¬Coauthor(Einstein,Bieber)

[Ceylan, Darwiche, Van den Broeck; KR’16]

Page 73: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Intuition

X Y P

Einstein Straus 0.7

Erdos Straus 0.6

Einstein Pauli 0.9

Erdos Renyi 0.7

Kersting Natarajan 0.8

Luc Paol 0.1

… … …

We know for sure that P(Q1) ≥ P(Q3), P(Q1) ≥ P(Q4)

and P(Q3) ≥ P(Q5), P(Q4) ≥ P(Q5) because P(Q5) = 0.

Q1 = ∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

Q3 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus)

Q4 = Coauthor(Einstein,Bieber) ∧ Coauthor(Erdos,Bieber)

Q5 = Coauthor(Einstein,Bieber) ∧ ¬Coauthor(Einstein,Bieber)

[Ceylan, Darwiche, Van den Broeck; KR’16]

Page 74: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Intuition

X Y P

Einstein Straus 0.7

Erdos Straus 0.6

Einstein Pauli 0.9

Erdos Renyi 0.7

Kersting Natarajan 0.8

Luc Paol 0.1

… … …

We know for sure that P(Q1) ≥ P(Q3), P(Q1) ≥ P(Q4)

and P(Q3) ≥ P(Q5), P(Q4) ≥ P(Q5) because P(Q5) = 0.

We have strong evidence that P(Q1) ≥ P(Q2).

Q1 = ∃x Coauthor(Einstein,x) ∧ Coauthor(Erdos,x)

Q2 = ∃x Coauthor(Bieber,x) ∧ Coauthor(Erdos,x)

Q3 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus)

Q4 = Coauthor(Einstein,Bieber) ∧ Coauthor(Erdos,Bieber)

Q5 = Coauthor(Einstein,Bieber) ∧ ¬Coauthor(Einstein,Bieber)

[Ceylan, Darwiche, Van den Broeck; KR’16]

Page 75: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Problem: Curse of Superlinearity

Reality is worse: tuples

intentionally missing!

[Ceylan, Darwiche, Van den Broeck; KR’16]

Page 76: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Problem: Curse of Superlinearity

Reality is worse: tuples

intentionally missing!

x y P

… … …

Sibling

Facebook scale

[Ceylan, Darwiche, Van den Broeck; KR’16]

Page 77: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Problem: Curse of Superlinearity

Reality is worse: tuples

intentionally missing!

x y P

… … …

Sibling

Facebook scale

⇒ 200 Exabytes of data

[Ceylan, Darwiche, Van den Broeck; KR’16]

Page 78: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Problem: Curse of Superlinearity

Reality is worse: tuples

intentionally missing!

x y P

… … …

Sibling

Facebook scale

All Google storage is 2 exabytes…

⇒ 200 Exabytes of data

[Ceylan, Darwiche, Van den Broeck; KR’16]

Page 79: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Problem: Curse of Superlinearity

Reality is worse: tuples

intentionally missing!

x y P

… … …

Sibling

Facebook scale

All Google storage is 2 exabytes…

⇒ 200 Exabytes of data

[Ceylan, Darwiche, Van den Broeck; KR’16]

Page 80: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Problem: Model Evaluation

Given:

Learn:

0.8::Coauthor(x,y) :- Coauthor(z,x) ∧ Coauthor(z,y).

x y P

Einstein Straus 0.7

Erdos Straus 0.6

Einstein Pauli 0.9

… … …

Co

au

tho

r

0.6::Coauthor(x,y) :- Affiliation(x,z) ∧ Affiliation(y,z).

OR

[De Raedt et al; IJCAI’15]

Page 81: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Problem: Model Evaluation

Given:

Learn:

0.8::Coauthor(x,y) :- Coauthor(z,x) ∧ Coauthor(z,y).

x y P

Einstein Straus 0.7

Erdos Straus 0.6

Einstein Pauli 0.9

… … …

Co

au

tho

r

0.6::Coauthor(x,y) :- Affiliation(x,z) ∧ Affiliation(y,z).

OR

What is the likelihood, precision, accuracy, …?

[De Raedt et al; IJCAI’15]

Page 82: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Open-World Prob. Databases

Intuition: tuples can be added with P < λ

Q2 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus)

X Y P

Einstein Straus 0.7

Einstein Pauli 0.9

Erdos Renyi 0.7

Kersting Natarajan 0.8

Luc Paol 0.1

… … …

Coauthor

P(Q2) ≥ 0

Page 83: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Open-World Prob. Databases

Intuition: tuples can be added with P < λ

Q2 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus)

X Y P

Einstein Straus 0.7

Einstein Pauli 0.9

Erdos Renyi 0.7

Kersting Natarajan 0.8

Luc Paol 0.1

… … …

Coauthor

X Y P

Einstein Straus 0.7

Einstein Pauli 0.9

Erdos Renyi 0.7

Kersting Natarajan 0.8

Luc Paol 0.1

… … …

Erdos Straus λ

Coauthor

P(Q2) ≥ 0

Page 84: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Open-World Prob. Databases

Intuition: tuples can be added with P < λ

Q2 = Coauthor(Einstein,Straus) ∧ Coauthor(Erdos,Straus)

X Y P

Einstein Straus 0.7

Einstein Pauli 0.9

Erdos Renyi 0.7

Kersting Natarajan 0.8

Luc Paol 0.1

… … …

Coauthor

X Y P

Einstein Straus 0.7

Einstein Pauli 0.9

Erdos Renyi 0.7

Kersting Natarajan 0.8

Luc Paol 0.1

… … …

Erdos Straus λ

Coauthor

0.7 * λ ≥ P(Q2) ≥ 0

Page 85: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

How open-world query

evaluation?

Page 86: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

UCQ / Monotone CNF

• Lower bound = closed-world probability

• Upper bound = probability after adding all

tuples with probability λ

Page 87: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

UCQ / Monotone CNF

• Lower bound = closed-world probability

• Upper bound = probability after adding all

tuples with probability λ

• Polynomial time☺

Page 88: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

UCQ / Monotone CNF

• Lower bound = closed-world probability

• Upper bound = probability after adding all

tuples with probability λ

• Polynomial time☺

• Quadratic blow-up

• 200 exabytes … again

Page 89: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Closed-World Lifted Query Eval

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

Page 90: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Closed-World Lifted Query Eval

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

Decomposable ∃-Rule

Page 91: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Closed-World Lifted Query Eval

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

Decomposable ∃-Rule

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y))

x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y))

x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y))

x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y))

x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y))

Page 92: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Closed-World Lifted Query Eval

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

Decomposable ∃-Rule

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y))

x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y))

x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y))

x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y))

x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y))

Complexity PTIME

Page 93: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Closed-World Lifted Query Eval

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

Decomposable ∃-Rule

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y))

x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y))

x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y))

x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y))

x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y))

Complexity PTIME

Check independence:

Scientist(A) ∧ ∃y Coauthor(A,y)

Scientist(B) ∧ ∃y Coauthor(B,y)

Page 94: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Closed-World Lifted Query Eval

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y))

x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y))

x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y))

x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y))

x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y))

Page 95: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Closed-World Lifted Query Eval

No supporting facts

in database!

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y))

x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y))

x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y))

x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y))

x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y))

Page 96: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Closed-World Lifted Query Eval

No supporting facts

in database!

Probability 0 in closed world

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y))

x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y))

x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y))

x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y))

x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y))

Page 97: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Closed-World Lifted Query Eval

No supporting facts

in database!

Probability 0 in closed world

Ignore these sub-queries!

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y))

x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y))

x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y))

x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y))

x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y))

Page 98: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Closed-World Lifted Query Eval

No supporting facts

in database!

Complexity linear time!

Probability 0 in closed world

Ignore these sub-queries!

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y))

x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y))

x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y))

x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y))

x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y))

Page 99: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Open-World Lifted Query Eval

No supporting facts

in database!

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y))

x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y))

x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y))

x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y))

x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y))

Page 100: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Open-World Lifted Query Eval

No supporting facts

in database!

Probability λ in open world

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y))

x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y))

x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y))

x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y))

x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y))

Page 101: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Open-World Lifted Query Eval

No supporting facts

in database!

Complexity PTIME!

Probability λ in open world

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y))

x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y))

x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y))

x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y))

x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y))

Page 102: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Open-World Lifted Query Eval

No supporting facts

in database!

Probability p in closed world

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y))

x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y))

x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y))

x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y))

x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y))

Page 103: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Open-World Lifted Query Eval

No supporting facts

in database!

Probability p in closed world

All together, probability (1-p)k

Exploit symmetry

Lifted inference

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y))

x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y))

x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y))

x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y))

x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y))

Page 104: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Open-World Lifted Query Eval

No supporting facts

in database!

Probability p in closed world

All together, probability (1-p)k

Exploit symmetry

Lifted inference Complexity linear time!

Q = ∃x ∃y Scientist(x) ∧ Coauthor(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y))

x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y))

x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y))

x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y))

x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y))

Page 105: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

[Ceylan’16]

Complexity Results

Page 106: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Implement PDB Query in SQL

– Convert to nested SQL recursively

– Open-world existential quantification

– Conjunction

– Run as single query!

SELECT (1.0-(1.0-pUse)*power(1.0-0.0001,(4-ct))) AS pUse

FROM

(SELECT ior(COALESCE(pUse,0)) AS pUse,

count(*) AS ct

FROM SQL(conjunction)

0.0001 = open-world probability; 4 = # open-world query instances

ior = Independent OR aggregate function

Q = ∃x P(x) ∧ Q(x)

SELECT q9.c5,

COALESCE(q9.pUse,λ)*COALESCE(q10.pUse,λ) AS pUse

FROM

SQL(Q(X)) OUTER JOIN SQL(P(X)) SELECT Q.v0 AS c5,

p AS pUse

FROM Q

[Tal Friedman, Eric Gribkoff]

Page 107: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

0

100

200

300

400

500

600

0 10 20 30 40 50 60 70

Size of Domain

OpenPDB vs Problog Running Times (s)

PDB Problog Linear (PDB)

Out of memory trying to run the ProbLog query with 70 constants in domain

[Tal Friedman]

Page 108: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

0

100

200

300

400

500

600

0 500 1000 1500 2000 2500 3000

Size of Domain

OpenPDB vs Problog Running Times (s)

PDB Problog Linear (PDB)

[Tal Friedman]

Page 109: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

0

100

200

300

400

500

600

0 500 1000 1500 2000 2500 3000

Size of Domain

OpenPDB vs Problog Running Times (s)

PDB Problog Linear (PDB)12.5 million

random variables!

[Tal Friedman]

Page 110: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

What is the broader picture?

First-Order Model Counting

Page 111: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Model Counting

• Model = solution to a propositional logic formula Δ

• Model counting = #SAT

Rain Cloudy Model?

T T Yes

T F No

F T Yes

F F Yes

#SAT = 3

+

Δ = (Rain ⇒ Cloudy)

Page 112: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Model Counting

• Model = solution to a propositional logic formula Δ

• Model counting = #SAT

Rain Cloudy Model?

T T Yes

T F No

F T Yes

F F Yes

#SAT = 3

+

Δ = (Rain ⇒ Cloudy)

[Valiant] #P-hard, even for 2CNF

Page 113: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Weighted Model Count

• Weights for assignments to variables

• Model weight = product of variable weights

Rain Cloudy Model?

T T Yes

T F No

F T Yes

F F Yes

Δ = (Rain ⇒ Cloudy)

Page 114: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Weighted Model Count

• Weights for assignments to variables

• Model weight = product of variable weights

Rain

w(R) w(¬R)

1 2

Cloudy

w(C) w(¬C)

3 5

Rain Cloudy Model?

T T Yes

T F No

F T Yes

F F Yes

Δ = (Rain ⇒ Cloudy)

Page 115: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Weighted Model Count

Weight

1 * 3 = 3

0

2 * 3 = 6

2 * 5 = 10

WMC = 19

• Weights for assignments to variables

• Model weight = product of variable weights

+

Rain

w(R) w(¬R)

1 2

Cloudy

w(C) w(¬C)

3 5

Rain Cloudy Model?

T T Yes

T F No

F T Yes

F F Yes

Δ = (Rain ⇒ Cloudy)

Page 116: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Assembly language for

probabilistic reasoning

Bayesian networks Factor graphs

Probabilistic databases

Relational Bayesian networks

Probabilistic logic programs

Markov Logic

Weighted Model Counting

[Chavira 2006, Chavira 2008, Sang 2005, Fierens 2015]

Page 117: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

...

Simple Reasoning Problem

?

Probability that Card1 is Hearts? 1/4

[Van den Broeck; AAAI-KRR’15]

Page 118: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Model distribution by FOMC:

...

∀p, ∃c, Card(p,c)

∀c, ∃p, Card(p,c)

∀p, ∀c, ∀c’, Card(p,c) ∧ Card(p,c’) ⇒ c = c’

Δ =

[Van den Broeck 2015]

Page 119: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Beyond NP Pipeline for #P

Reduce to propositional model counting:

[Van den Broeck 2015]

Page 120: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Beyond NP Pipeline for #P

Reduce to propositional model counting:

Card(A♥,p1) v … v Card(2♣,p1)

Card(A♥,p2) v … v Card(2♣,p2)

Card(A♥,p1) v … v Card(A♥,p52)

Card(K♥,p1) v … v Card(K♥,p52)

¬Card(A♥,p1) v ¬Card(A♥,p2)

¬Card(A♥,p1) v ¬Card(A♥,p3)

Δ =

[Van den Broeck 2015]

Page 121: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Beyond NP Pipeline for #P

Reduce to propositional model counting:

Card(A♥,p1) v … v Card(2♣,p1)

Card(A♥,p2) v … v Card(2♣,p2)

Card(A♥,p1) v … v Card(A♥,p52)

Card(K♥,p1) v … v Card(K♥,p52)

¬Card(A♥,p1) v ¬Card(A♥,p2)

¬Card(A♥,p1) v ¬Card(A♥,p3)

Δ =

What will

happen?

[Van den Broeck 2015]

Page 122: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Deck of Cards Graphically

K♥

A♥

2♥

3♥

[VdB’15]

Page 123: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Deck of Cards Graphically

K♥

A♥

2♥

3♥

Card(K♥,p52)

[VdB’15]

Page 124: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Deck of Cards Graphically

K♥

A♥

2♥

3♥

One model/perfect matching

[VdB’15]

Page 125: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Deck of Cards Graphically

K♥

A♥

2♥

3♥

[VdB’15]

Page 126: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Deck of Cards Graphically

K♥

A♥

2♥

3♥

Card(K♥,p52)

[VdB’15]

Page 127: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Deck of Cards Graphically

K♥

A♥

2♥

3♥

Card(K♥,p52)

Model counting: How many perfect matchings?

[VdB’15]

Page 128: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Deck of Cards Graphically

K♥

A♥

2♥

3♥

[VdB’15]

What if I set

w(Card(K♥,p52)) = 0?

Page 129: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Deck of Cards Graphically

K♥

A♥

2♥

3♥

What if I set

w(Card(K♥,p52)) = 0?

[VdB’15]

Page 130: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Observations

• Weight function = bipartite graph

• # models = # perfect matchings

• Problem is #P-complete!

[VdB’15]

Page 131: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Observations

• Weight function = bipartite graph

• # models = # perfect matchings

• Problem is #P-complete!

[VdB’15]

No propositional WMC solver can

handle cards problem efficiently!

Page 132: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Observations

• Weight function = bipartite graph

• # models = # perfect matchings

• Problem is #P-complete!

What is going on here?

[VdB’15]

No propositional WMC solver can

handle cards problem efficiently!

Page 133: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Symmetric Weighted FOMC

No database! No literal-specific weights!

Def. A weighted vocabulary is (R, w), where

– R = (R1, R2, …, Rk) = relational vocabulary

– w = (w1, w2, …, wk) = weights

– Implicit weights: w(Ri(t)) = wi

Special case: wi = 1 is model counting

Complexity in terms of domain size n

Page 134: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

FOMC Inference Rules

• Simplification to ∃,∀ rules:

For example:

P(∀z Q) = P(Q[C1/z])|Domain|

[VdB’11]

Page 135: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

FOMC Inference Rules

• Simplification to ∃,∀ rules:

For example:

P(∀z Q) = P(Q[C1/z])|Domain|

The workhorse

of FOMC

• A powerful new inference rule: atom counting

Only possible with symmetric weights

Intuition: Remove unary relations

[VdB’11]

Page 136: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

First-Order Model Counting: Example

[Van den Broeck 2015]

Δ = ∀x ,y ∈ People: Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

Page 137: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

First-Order Model Counting: Example

If we know D precisely: who smokes, and there are k smokers?

k

n-k

k

n-k

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

[Van den Broeck 2015]

Δ = ∀x ,y ∈ People: Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

Page 138: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

First-Order Model Counting: Example

If we know D precisely: who smokes, and there are k smokers?

k

n-k

k

n-k

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

[Van den Broeck 2015]

Δ = ∀x ,y ∈ People: Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

Page 139: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

First-Order Model Counting: Example

If we know D precisely: who smokes, and there are k smokers?

k

n-k

k

n-k

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

[Van den Broeck 2015]

Δ = ∀x ,y ∈ People: Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

Page 140: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

First-Order Model Counting: Example

If we know D precisely: who smokes, and there are k smokers?

k

n-k

k

n-k

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

[Van den Broeck 2015]

Δ = ∀x ,y ∈ People: Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

Page 141: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

First-Order Model Counting: Example

If we know D precisely: who smokes, and there are k smokers?

k

n-k

k

n-k

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

[Van den Broeck 2015]

Δ = ∀x ,y ∈ People: Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

Page 142: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

First-Order Model Counting: Example

If we know D precisely: who smokes, and there are k smokers?

k

n-k

k

n-k

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

[Van den Broeck 2015]

Δ = ∀x ,y ∈ People: Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

Page 143: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

First-Order Model Counting: Example

If we know D precisely: who smokes, and there are k smokers?

k

n-k

k

n-k

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

[Van den Broeck 2015]

Δ = ∀x ,y ∈ People: Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

Page 144: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

First-Order Model Counting: Example

If we know D precisely: who smokes, and there are k smokers?

k

n-k

k

n-k

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

[Van den Broeck 2015]

Δ = ∀x ,y ∈ People: Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

Page 145: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

First-Order Model Counting: Example

If we know D precisely: who smokes, and there are k smokers?

k

n-k

k

n-k

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

[Van den Broeck 2015]

Δ = ∀x ,y ∈ People: Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

Page 146: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

First-Order Model Counting: Example

If we know D precisely: who smokes, and there are k smokers?

k

n-k

k

n-k

→ models

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

[Van den Broeck 2015]

Δ = ∀x ,y ∈ People: Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

Page 147: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

First-Order Model Counting: Example

If we know D precisely: who smokes, and there are k smokers?

k

n-k

k

n-k

If we know that there are k smokers?

→ models

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

[Van den Broeck 2015]

Δ = ∀x ,y ∈ People: Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

Page 148: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

First-Order Model Counting: Example

If we know D precisely: who smokes, and there are k smokers?

k

n-k

k

n-k

If we know that there are k smokers?

→ models

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

→ models

Smokes Smokes Friends

[Van den Broeck 2015]

Δ = ∀x ,y ∈ People: Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

Page 149: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

First-Order Model Counting: Example

If we know D precisely: who smokes, and there are k smokers?

k

n-k

k

n-k

If we know that there are k smokers?

In total…

→ models

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

→ models

Smokes Smokes Friends

[Van den Broeck 2015]

Δ = ∀x ,y ∈ People: Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

Page 150: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

First-Order Model Counting: Example

If we know D precisely: who smokes, and there are k smokers?

k

n-k

k

n-k

If we know that there are k smokers?

In total…

→ models

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

→ models

→ models

Smokes Smokes Friends

[Van den Broeck 2015]

Δ = ∀x ,y ∈ People: Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

Page 151: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

...

Playing Cards Revisited

∀p, ∃c, Card(p,c) ∀c, ∃p, Card(p,c)

∀p, ∀c, ∀c’, Card(p,c) ∧ Card(p,c’) ⇒ c = c’

[Van den Broeck.; AAAI-KR’15]

Page 152: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

...

Playing Cards Revisited

∀p, ∃c, Card(p,c) ∀c, ∃p, Card(p,c)

∀p, ∀c, ∀c’, Card(p,c) ∧ Card(p,c’) ⇒ c = c’

[Van den Broeck.; AAAI-KR’15]

Page 153: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

...

Playing Cards Revisited

∀p, ∃c, Card(p,c) ∀c, ∃p, Card(p,c)

∀p, ∀c, ∀c’, Card(p,c) ∧ Card(p,c’) ⇒ c = c’

Computed in time polynomial in n

[Van den Broeck.; AAAI-KR’15]

Page 154: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Open-World Lifted Query Eval

All together, probability (1-p)k

Q = ∃x ∃y Smoker(x) ∧ Friend(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y))

x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y))

x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y))

x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y))

x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y))

Page 155: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Open-World Lifted Query Eval

All together, probability (1-p)k

Open-world query evaluation on empty db

= Symmetric First-Order Model Counting

Q = ∃x ∃y Smoker(x) ∧ Friend(x,y)

P(Q) = 1 - ΠA ∈ Domain (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

= 1 - (1 - P(Scientist(A) ∧ ∃y Coauthor(A,y))

x (1 - P(Scientist(B) ∧ ∃y Coauthor(B,y))

x (1 - P(Scientist(C) ∧ ∃y Coauthor(C,y))

x (1 - P(Scientist(D) ∧ ∃y Coauthor(D,y))

x (1 - P(Scientist(E) ∧ ∃y Coauthor(E,y))

x (1 - P(Scientist(F) ∧ ∃y Coauthor(F,y))

Page 156: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

X Y

Smokes(x)

Gender(x)

Young(x)

Tall(x)

Smokes(y)

Gender(y)

Young(y)

Tall(y)

Properties Properties

FO2 is liftable!

Page 157: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

X Y

Smokes(x)

Gender(x)

Young(x)

Tall(x)

Smokes(y)

Gender(y)

Young(y)

Tall(y)

Properties Properties

Friends(x,y)

Colleagues(x,y)

Family(x,y)

Classmates(x,y)

Relations

FO2 is liftable!

Page 158: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

X Y

Smokes(x)

Gender(x)

Young(x)

Tall(x)

Smokes(y)

Gender(y)

Young(y)

Tall(y)

Properties Properties

Friends(x,y)

Colleagues(x,y)

Family(x,y)

Classmates(x,y)

Relations

FO2 is liftable!

“Smokers are more likely to be friends with other smokers.” “Colleagues of the same age are more likely to be friends.”

“People are either family or friends, but never both.” “If X is family of Y, then Y is also family of X.”

“If X is a parent of Y, then Y cannot be a parent of X.”

Page 159: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Tractable Classes

FO2

CNF

FO2

Safe monotone CNF Safe type-1 CNF

#P1

FO3

#P1

CQs

[VdB; NIPS’11+, [VdB et al.; KR’14], [Gribkoff, VdB, Suciu; UAI’15+, [Beame, VdB, Gribkoff, Suciu; PODS’15+, etc.

#P1

Page 160: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Tractable Classes

FO2

CNF

FO2

Safe monotone CNF Safe type-1 CNF

? #P1

FO3

#P1

CQs

Δ = ∀x,y,z, Friends(x,y) ∧ Friends(y,z) ⇒ Friends(x,z)

[VdB; NIPS’11+, [VdB et al.; KR’14], [Gribkoff, VdB, Suciu; UAI’15+, [Beame, VdB, Gribkoff, Suciu; PODS’15+, etc.

#P1

Page 161: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Statistical Relational Learning 3.14 Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y) Markov Logic

[Van den Broeck,PhD’13]

Page 162: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Statistical Relational Learning 3.14 Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

∀x,y, F(x,y) ⇔ [ Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y) ]

Weight Function

w(Smokes)=1 w(¬Smokes )=1 w(Friends )=1 w(¬Friends )=1

w(F)=3.14 w(¬F)=1

FOL Sentence

Markov Logic

[Van den Broeck,PhD’13]

Page 163: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Statistical Relational Learning 3.14 Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

∀x,y, F(x,y) ⇔ [ Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y) ]

Weight Function

w(Smokes)=1 w(¬Smokes )=1 w(Friends )=1 w(¬Friends )=1

w(F)=3.14 w(¬F)=1

FOL Sentence

First-Order d-DNNF Circuit

Markov Logic

[Van den Broeck,PhD’13]

Compile? Compile?

Page 164: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Statistical Relational Learning 3.14 Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

∀x,y, F(x,y) ⇔ [ Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y) ]

Weight Function

w(Smokes)=1 w(¬Smokes )=1 w(Friends )=1 w(¬Friends )=1

w(F)=3.14 w(¬F)=1

FOL Sentence

First-Order d-DNNF Circuit

Domain

Alice Bob

Charlie

Markov Logic

[Van den Broeck,PhD’13]

Compile? Compile?

Page 165: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Statistical Relational Learning 3.14 Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

∀x,y, F(x,y) ⇔ [ Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y) ]

Weight Function

w(Smokes)=1 w(¬Smokes )=1 w(Friends )=1 w(¬Friends )=1

w(F)=3.14 w(¬F)=1

FOL Sentence

First-Order d-DNNF Circuit

Domain

Alice Bob

Charlie Z = WFOMC = 1479.85

Markov Logic

[Van den Broeck,PhD’13]

Compile? Compile?

Page 166: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Statistical Relational Learning

Evaluation in time polynomial in domain size!

3.14 Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

∀x,y, F(x,y) ⇔ [ Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y) ]

Weight Function

w(Smokes)=1 w(¬Smokes )=1 w(Friends )=1 w(¬Friends )=1

w(F)=3.14 w(¬F)=1

FOL Sentence

First-Order d-DNNF Circuit

Domain

Alice Bob

Charlie Z = WFOMC = 1479.85

Markov Logic

[Van den Broeck,PhD’13]

Compile? Compile?

Page 167: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Lifted Machine Learning

• Given: A set of first-order logic formulas A set of training databases

• Learn: Maximum-likelihood weights

• Also structure learning!

[Van Haaren et al.; MLJ’15+

900,030,000 random variables

Page 168: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

The Even Broader Picture

• Statistical relational learning (e.g., Markov logic)

Open-domain models (BLOG)

• Probabilistic description logics

• Certain query answers in databases

• Open information extraction

• Learning from positive-only examples

• Imprecise probabilities

Credal sets, interval probability, qualitative uncertainty

• Credal Bayesian networks

Page 169: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

Conclusions

Relational probabilistic reasoning is frontier and

integration of AI, KR, ML, DB, TH, etc.

We need

– relational models and logic

– probabilistic models and statistical learning

– algorithms that scale

• Open-world data model

– semantics makes sense

– FREE for UCQs, expensive otherwise

– deep connection to model counting

Page 170: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database
Page 171: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

References

• Ceylan, Ismail Ilkan, Adnan Darwiche, and Guy Van den Broeck. "Open-world probabilistic databases." Proceedings of KR (2016).

• Suciu, Dan, Dan Olteanu, Christopher Ré, and Christoph Koch. "Probabilistic databases." Synthesis Lectures on Data Management 3, no. 2 (2011): 1-180.

• Dong, Xin, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. "Knowledge vault: A web-scale approach to probabilistic knowledge fusion." In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 601-610. ACM, 2014.

• Carlson, Andrew, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr, and Tom M. Mitchell. "Toward an Architecture for Never-Ending Language Learning." In AAAI, vol. 5, p. 3. 2010.

• Niu, Feng, Ce Zhang, Christopher Ré, and Jude W. Shavlik. "DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference." VLDS 12 (2012): 25-28.

Page 172: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

References

• Chen, Brian X. "Siri, Alexa and Other Virtual Assistants Put to the Test" The New York Times (2016).

• Dalvi, Nilesh, and Dan Suciu. "The dichotomy of probabilistic inference for unions of conjunctive queries." Journal of the ACM (JACM) 59, no. 6 (2012): 30.

• De Raedt, Luc, Anton Dries, Ingo Thon, Guy Van den Broeck, and Mathias Verbeke. "Inducing probabilistic relational rules from probabilistic examples." In Proceedings of the 24th International Conference on Artificial Intelligence, pp. 1835-1843. AAAI Press, 2015.

• Van den Broeck, Guy. "Towards high-level probabilistic reasoning with lifted inference." AAAI Spring Symposium on KRR (2015).

• Niepert, Mathias, and Guy Van den Broeck. "Tractability through exchangeability: A new perspective on efficient probabilistic inference." AAAI (2014).

• Van den Broeck, Guy. "On the completeness of first-order knowledge compilation for lifted probabilistic inference." In Advances in Neural Information Processing Systems, pp. 1386-1394. 2011.

Page 173: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

References

• Van den Broeck, Guy, Wannes Meert, and Adnan Darwiche. "Skolemization for weighted first-order model counting." In Proceedings of the 14th International Conference on Principles of Knowledge Representation and Reasoning (KR). 2014.

• Gribkoff, Eric, Guy Van den Broeck, and Dan Suciu. "Understanding the complexity of lifted inference and asymmetric weighted model counting." UAI, 2014.

• Beame, Paul, Guy Van den Broeck, Eric Gribkoff, and Dan Suciu. "Symmetric weighted first-order model counting." In Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 313-328. ACM, 2015.

• Chavira, Mark, and Adnan Darwiche. "On probabilistic inference by weighted model counting." Artificial Intelligence 172.6 (2008): 772-799.

• Sang, Tian, Paul Beame, and Henry A. Kautz. "Performing Bayesian inference by weighted model counting." AAAI. Vol. 5. 2005.

Page 174: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

References

• Van den Broeck, Guy, Nima Taghipour, Wannes Meert, Jesse Davis, and Luc De Raedt. "Lifted probabilistic inference by first-order knowledge compilation." In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence, pp. 2178-2185. AAAI Press/International Joint Conferences on Artificial Intelligence, 2011.

• Van den Broeck, Guy. Lifted inference and learning in statistical relational models. Diss. Ph. D. Dissertation, KU Leuven, 2013.

• Gogate, Vibhav, and Pedro Domingos. "Probabilistic theorem proving." UAI (2011).

• Guy Van den Broeck and Dan Suciu. Query Processing on Probabilistic Data: A Survey, Foundations and Trends in Databases, Now Publishers, 2017

Page 175: Open-World Probabilistic Databasesstarai.cs.ucla.edu/slides/GCAI17.pdfGoogle Knowledge Graph > 570 million entities > 18 billion tuples • Tuple-independent probabilistic database

References

• Belle, Vaishak, Andrea Passerini, and Guy Van den Broeck. "Probabilistic inference in hybrid domains by weighted model integration." Proceedings of 24th International Joint Conference on Artificial Intelligence (IJCAI). 2015.

• Belle, Vaishak, Guy Van den Broeck, and Andrea Passerini. "Hashing-based approximate probabilistic inference in hybrid domains." In Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI). 2015.

• Fierens, Daan, Guy Van den Broeck, Joris Renkens, Dimitar Shterionov, Bernd Gutmann, Ingo Thon, Gerda Janssens, and Luc De Raedt. "Inference and learning in probabilistic logic programs using weighted boolean formulas." Theory and Practice of Logic Programming 15, no. 03 (2015): 358-401.