Top Banner
Sven Köhler 1 Bertram Ludäscher 1,3 Yannis Smaragdakis 2,3 1 University of California, Davis 2 University of Athens, Greece 3 LogicBlox Inc., Atlanta, USA UC DAVIS Department of Computer Science Datalog2.0, Vienna Logic Weeks, 2012
39

Declarative Datalog Debugging for Mere Mortals

Mar 17, 2018

Download

Data & Analytics

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Declarative Datalog Debugging for Mere Mortals

Sven Köhler1 Bertram Ludäscher1,3 Yannis Smaragdakis2,3

1University of California, Davis2University of Athens, Greece3LogicBlox Inc., Atlanta, USA

UC DAVISDepartment ofComputer Science

Datalog2.0, Vienna Logic Weeks, 2012

Page 2: Declarative Datalog Debugging for Mere Mortals

Outline

¨ Motivation ¤ Debugging and Profiling Declarative Rules

¨ Basic Idea¤ Capture derivations (provenance) in an enriched model M’¤ … then run Datalog queries on M’ (when in Rome, … )

¨ Simple “Tricks” for Mere Mortals¤ F: record rule firings (TP instances)¤ G: reify firings as nodes in a firing graph¤ S: keep track of firing stages (Statelog)¤ Query the enriched model (provenance graph)!

¨ Debugging and Profiling Examples¨ Musings & Conclusions

¤ Graph-based Provenance Analyzer (GPad/DLV, GPad/LB)

Page 3: Declarative Datalog Debugging for Mere Mortals

Declarative Debugging

¨ Resurgence/Renaissance of Datalog … ¤ … and Declarative Programming

n e.g., parallel programming beyond MapReduce¤ … an old dream: Executable Specifications!

¨ But writing large declarative programs is still tricky¤ 9-valued logics anyone? (cf. Kunen’s PhD effect)¤ How can we empower “regular” Datalog programmers

(mere mortals)?è Simple tools & techniques for debugging (and profiling)

¨ Ideally:¤ Don’t tie the approach to a particular computation model¤ Instead devise a declarative debugging approachè should work for different implementations

Page 4: Declarative Datalog Debugging for Mere Mortals

Running Example

Page 5: Declarative Datalog Debugging for Mere Mortals

Pop Quiz: Why/how come tc(a,b) ?

¨ Why/how is (a,b) in the transitive closure tc of e ?¨ What about ?-tc(e,X) vs ?-tc(X,e)

Page 6: Declarative Datalog Debugging for Mere Mortals

Declarative Debugging: Prolog

Page 7: Declarative Datalog Debugging for Mere Mortals

Declarative Debugging: Prolog

Page 8: Declarative Datalog Debugging for Mere Mortals

Hmm.. Many answers …

(infinitelymany…)

Page 9: Declarative Datalog Debugging for Mere Mortals

?-tc(a,b)

:-e(a,b) :-e(a,_G263),tc(_G263,b)

:-true

Answer:tc(a,b)

:-tc(b,b)

:-e(b,b) :-e(b,_G347),tc(_G347,b)

:-tc(c,b)

:-e(c,b) :-e(c,_G431),tc(_G431,b)

:-true

Answer:tc(a,b)

:-tc(b,b) :-tc(d,b)

:-e(b,b) :-e(b,_G515),tc(_G515,b)

:-tc(c,b)

:-e(c,b) :-e(c,_G599),tc(_G599,b)

:-true

Answer:tc(a,b)

:-tc(b,b) :-tc(d,b)

:-e(b,b) :-e(b,_G683),tc(_G683,b)

:-tc(c,b)

:-e(c,b) :-e(c,_G767),tc(_G767,b)

:-true

Answer:tc(a,b)

:-tc(b,b) :-tc(d,b)

:-e(b,b) :-e(b,_G851),tc(_G851,b) :-e(d,b) :-e(d,_G851),

tc(_G851,b)

:-e(d,b) :-e(d,_G683),tc(_G683,b)

:-e(d,b) :-e(d,_G515),tc(_G515,b)

Different ways to say “No”!

(Infinitely!) Many branches saying “Yes”

è finitely failed tree

infinitely failed tree…

Page 10: Declarative Datalog Debugging for Mere Mortals

Declarative Debugging: DATALOG

Page 11: Declarative Datalog Debugging for Mere Mortals

Debug this!

¨ Evaluating P on Iyields model MP(I)

¨ Too much information!

.. but also ..¨ Not enough

information!

Page 12: Declarative Datalog Debugging for Mere Mortals

Declarative Debugging: DATALOG

¨ Scope of this paper: ¤ positive Datalog programs (recursion OK, negation: not yet)¤ Why/How provenance (but no Why-Not provenance… yet)

Page 13: Declarative Datalog Debugging for Mere Mortals

Some Debugging and Profiling Use Cases

Page 14: Declarative Datalog Debugging for Mere Mortals

Solving the Provenance Quiz

¨ Example: Datalog program P = {[r1] tc(X,Y) :- e(X,Y).

[r2] tc(X,Y) :- e(X,Z), tc(Z,Y). }

¨ EDB instance I = {e(a,b). e(b,c). e(c,b). e(c,d). }

¨ Question: How can we justify / explain …

¤ .. why (how) it is, that tc(a,b) is in MP(I)?

Page 15: Declarative Datalog Debugging for Mere Mortals

[r1] tc(X,Y) :- e(X,Y)[r2] tc(X,Y) :- e(X,Z), tc(Z,Y)

Page 16: Declarative Datalog Debugging for Mere Mortals

[r1] tc(X,Y) :- e(X,Y)[r2] tc(X,Y) :- e(X,Z), tc(Z,Y)

A firing [F] à (H) is called unfounded, if all derivations of F require H as an assumption!

Here tc(a,b) has (at least) two different derivations, neither of which is unfounded.However, [r2] à tc(c,b) is unfounded: The firing of r2 depends on tc(b,b) which can only be derived by already assuming the desired conclusion tc(c,b)!

Page 17: Declarative Datalog Debugging for Mere Mortals

DATALOG Rewritings (GPAD)Firing graph: • captures the “full” provenance• reasonable overhead!? • has been/is being used (e.g. Orchestra, LogicBlox)• can be easily constructed!

• Provenance-enabled Debugging and Profiling for the rest of us!

Page 18: Declarative Datalog Debugging for Mere Mortals

Step 1: Capturing Rule Firings (“F-trick”)

¨ Capture rule firings and keep “witness info” (existential variables)¤ no premature projections in the rule head please!

¨ Example. Instead of a given rule …

tc(X,Y) :- e(X,Z), tc(Z,Y).

… we rather use these two rules, keeping witnesses Z around:

fire2(X,Z,Y) :- e(X,Z), tc(Z,Y).tc(X,Y) :- fire2(X,Z,Y).

�����������������

�������������� ������

���

���

������

���

Example rule firings

Page 19: Declarative Datalog Debugging for Mere Mortals

Step 2: Graph Transformation (“G-trick”)

¨ Reify provenance atoms & firings in a labeled graph g/3¨ Example for N = 2 subgoals and 1 head atom …

fire2(X,Z,Y) :- e(X,Z), tc(Z,Y). % two in-edgestc(X,Y) :- fire2(X,Z,Y). % one out-edge

… generates N+1 “reification rules” (Skolems are safe):g( e(X,Z), in, skfire2(X,Z,Y) ) :- fire2(X,Z,Y).g( tc(Z,Y), in, skfire2(X,Z,Y) ) :- fire2(X,Z,Y).

g( skfire2(X,Z,Y), out, tc(X,Y) ) :- fire2(X,Z,Y).

e(a,b)

fire2(a,b,d)

in

tc(a,d)out

tc(b,d)

in

Example instance generated by these rules

Page 20: Declarative Datalog Debugging for Mere Mortals

Step 3: Using Statelog (“S-Trick”)

¨ Use Statelog to keep record of firing rounds: ¤ Add state (=stage) argument to provenance rules and graph relations¤ EDB facts are derived in state 0.¤ Subsequently: extract earliest round for firings and IDB facts

¨ Example:

rin : firer(S1, X) :- B1(S, X1), … , Bn(S, Xn), next(S, S1).rout : H(S, Y) :- firer(S, X).

e(a,b) r1 [1]

r2 [3]

tc(a,b)[1]e(b,c)

r2 [2] tc(b,b)[2]

e(c,b)r1 [1]

r2 [3]

tc(c,b)[1]

Page 21: Declarative Datalog Debugging for Mere Mortals

How long (does it take) Provenance!

¨ These definitions are recursive but well-founded¨ The numbers can be easily obtained via Statelog

Page 22: Declarative Datalog Debugging for Mere Mortals

More Provenance Querying

¨ Provenance Views: ¤ Provenance subgraph relevant for debug atom Q:

ProvView(Q,X, out, Q) :- g(_,X,out,Q).ProvView(Q,X, L, Y) :- ProvView(Q,Y,_,_), g(_,X,L,Y).

¨ Length of derivations: ¤ first round this firing occurred

len(F,LenF) :- newFiring(S,F), LenF=S.

¨ Length of an atom: ¤ first round it was derived:

len(A,LenA) :- newAtom(S,A), LenA=S.

Page 23: Declarative Datalog Debugging for Mere Mortals

Declarative Profiling

Prr:tc(X,Y) :- e(X,Y).tc(X,Y) :- e(X,Z), tc(Z,Y).

Pdr:tc(X,Y) :- e(X,Y).tc(X,Y) :- tc(X,Z), tc(Z,Y).

Page 24: Declarative Datalog Debugging for Mere Mortals

Declarative Profiling

¨ Number of Facts:

derived(H) :-g(_,out, H).

derivedHeadCount(C) :-C = count{H : derived(H)

}.

¨ Number of Firings:firing(F) :- g(_,F,out,_).

firingCount(C) :-C = count{F : firing(F)}.

e(a,b) 1

2

3

4

tc(a,b)[1]

tc(a,c)[2]

tc(a,d)[3]

tc(a,e)[4]

e(b,c) 1

2

3

tc(b,c)[1]

tc(b,d)[2]

tc(b,e)[3]

e(c,d)1

2

tc(c,d)[1]

tc(c,e)[2]

e(d,e) 1 tc(d,e)[1]

3

tc(a,d)[3]3

3 tc(a,e)[3]

3 tc(b,e)[3]

3

4

4

e(a,b) 1 tc(a,b)[1]

e(b,c) 1 tc(b,c)[1]

e(c,d) 1 tc(c,d)[1]

e(d,e) 1 tc(d,e)[1]

2

2

2

tc(a,c)[2]

tc(b,d)[2]

tc(c,e)[2]

(a) right-recursive

(b) doubly-recursive

Page 25: Declarative Datalog Debugging for Mere Mortals

Declarative Profiling

¨ Number of Rederivations:

reDerivation(S,F) :-g(S,F,out,A), len(A,LenA), LenA < S.

reDerivCount(S,C) :-C = count{F : reDerivation(S,F)

}.

reDerivTotal(T) :-T = sum{C : reDerivCount(S,C)

}.

¨ Schema-Level Profiling: ¤ Number of new facts per relation

used in each round to derive new facts

factInRound(S,R,A) :-g(S, A, in, _),relName(A,R).

factInRound(S1,R,A) :-g(S,_, out, A), next(S,S1), relName(A,R).

newFact(S,R,A) :-g(S,_,out,A), not factsInRound(S,R,A),relName(A,R).

newFactsCount(S,R,C) :-C = count{ A : newFact(S,R,A)

}.

Page 26: Declarative Datalog Debugging for Mere Mortals

Profiling Example: Transitive Closure

¨ 45 facts¨ 45 rule firings¨ 10 rounds¨ 285 rederivations

¨ 45 facts¨ 129 rule firings¨ 6 rounds¨ 325 rederivations

Right Recursive Double Recursive

[Related factoid: a chain of length N has exactly one derivation in the right-recursive program but Catalan-number(N) many derivations in the doubly-recursive program!]

Page 27: Declarative Datalog Debugging for Mere Mortals

Real-WorldProfilingExample

¨ Provenance-basedprofilingcanexplainreal-worldbehavior

¨ E.g.,realisticgraph,~1700nodes~4000edges:¤ doublyrecursivetrans.closure>64Mrulefirings¤ right-recursivetrans.closure ~560Krulefirings¤ explainsexecutiontimedifference:>15secvs.2.5sec

Page 28: Declarative Datalog Debugging for Mere Mortals

BurdenofDeclarativeProfiling

¨ Theprovenancetransformincurscost¤ e.g.,fordouble-recursivetc:from~15secto~51secexecutiontime

¨ Veryhighspacecostistobeexpected¨ Justfortransitiveclosurerule:

¤ tc(X,Z):- e(X,Y),tc(Y,Z)Need a3-columntable (X,Y,Z)instead of2-column(X,Z)when recording provenance

¨ Approach will not scale to large provenance graphs¤ ...unless we invent specialized datastructures or customlogic for provenance

Page 29: Declarative Datalog Debugging for Mere Mortals

ProvenanceinLogicBloxDatalog

• Provenance transformation is already implemented in LB Datalog

• add to any program:lang:provenance[] = true.lang:provenance:recordConstants[]=true.

• Produces new provenance relations, per original rule, capturing values in rule firing

• Can write queries using such relations, to implement our GPAD– Graph-based Provenance Analyzer and Debugger

Page 30: Declarative Datalog Debugging for Mere Mortals

StateloginLogicBloxDatalog

¨ LBDatalog hasnonativeStatelog support¨ Wesimulateitinvariousways,typicallybyintroducinganexplicit“time” dimension

¨ Affectsperformancesignificantly¤ extrafactorofNtoasymptoticcomplexity

¨ Recoveringperformancethrough“unsafetricks”¤ Safeuseofrecursionthroughnegation

Page 31: Declarative Datalog Debugging for Mere Mortals

Provenance capture: F, G, SProvenance query: (g.u)+

Musings

Page 32: Declarative Datalog Debugging for Mere Mortals

“Elegance is not optional.”— Richard O’Keefe

¨ There is no tension between writing a beautiful program and writing an efficient program. If your code is ugly, the chances are that you either don’t understand your problem or you don’t understand your programming language, and in neither case does your code stand much chance of being efficient. In order to ensure that your program is efficient, you need to know what it is doing, and if your code is ugly, you will find it hard to analyse.

Page 33: Declarative Datalog Debugging for Mere Mortals

Query “Macros” for Debugging, Profiling

¨ What is the provenance of atom A? ¨ Regular Path Query (RPQ) ans(X,Y) :- (X, (g.u)+, Y).

¤ g = out-1 (OPM’s “was-generated-by”) ¤ u = in-1 (OPM “used”)

Page 34: Declarative Datalog Debugging for Mere Mortals

More musings:Many gurus are better than one!

¨ Are we too fragmented? Defragment your mind!¤ Discover the homomorphisms, relationships between

subcommunities!

¨ Look around, be promiscuous, interbreed!¤ idea-wise I mean!

¨ For example, look at … ¤ Theorem proving¤ Declarative LP semantics (e.g. well-founded models)¤ Procedural/production rule semantics (e.g. inflationary)

n Fixpoint logics

¤ PL e.g. Functional programming

Page 35: Declarative Datalog Debugging for Mere Mortals

Hamming Numbers in a Dataflow Network(= executable Kepler workflow)

Compute Hamming numbers H in order, where

a.k.a. regular numbers or 5-smooth numbers (numbers whose prime factors are <= 5).

Page 36: Declarative Datalog Debugging for Mere Mortals
Page 37: Declarative Datalog Debugging for Mere Mortals

Hamming “3-loops”

Hamming “1-loop”

Page 38: Declarative Datalog Debugging for Mere Mortals

Hamming Traces: “Debugged”

1

2

3

5

4

6

10

9

15

25

8

12

20

18

30

50

27

45

75

16

24

40

36

60

100

125

54

90

150

32

48

80

72

120

200

81

135

225

250

108

180

300

375

64

96

160

144

240

400

162

270

450

500

216

360

600

625

243

405

675

750

128

192

320

288

480

800

324

540

900

1000 432

720

486

810

256

384

640

576

960

648

729

864

972

512

768

1

2

3

5

4

6

10

9

15

25

8

12

20

18

30

50

27

45

75

16

24

40

36

60

100

125

54

90

150

32

48

80

72

120

200

81

135

225

250

108

180

300

375

64

96

160

144

240

400

162

270

450

500

216

360

600

625

243

405

675

750

128

192

320

288

480

800

324

540

900

1000

432

720

486

810

256

384

640

576

960

648

729

864

972

512

768

Provenance ofH1 ("Fish")

Provenance ofH3 ("Sail")

For each H-number, there are many paths

è many re-derivations!

For each H-number, there is exactly one path to the root = unique derivation!

Datalog as a Lingua Franca for Provenance Querying and Reasoning, Dey et al, TaPP’12

Page 39: Declarative Datalog Debugging for Mere Mortals

Conclusions

¨ Declarative Debugging for the rest of us!¤ Simple program transformations: Pà {F, G, S}à P’¤ Apply Datalog queries (RPQ, aggregation, …) on MP’

¤ “Turn Datalog on itself!”¨ Prototypical implementations underway:

¤ GPad/DLVn uses SWI-Prolog as gluen … but could benefit e.g. from DLV IDE and tools!

¤ GPad/LBn uses LogicBlox platform and tools (MoreBlox, …)

¨ Coming up next:¤ Finish GPad(s), add library of common queries (RPQ, LCA,…)¤ Clarify connection to provenance semirings¤ Extending to Datalog-neg, why-not, …