Top Banner
Introduction to Tree Grammar Compression Yoh Okuno
22
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to tree grammar compression

Introduction to Tree Grammar CompressionYoh Okuno

Page 2: Introduction to tree grammar compression

Summary● Trie contains repeated substructures.● Idea: share frequent patterns inside trie.● Approach: extend grammar compression.● TreeRePair: detect frequent pairs recursively

Method TreeRePair Succinct Pointer

Size 1/43 1/7 1

Time 14 3 1[Lohrey 2011]

Page 3: Introduction to tree grammar compression

Background: Repeated Substructures in Trie

● going● having● making

● pretended● attendance● tendency

Page 4: Introduction to tree grammar compression

Represent string by a small context free grammar which generates only the string.

RePair: String Grammar Compression

Initialize the grammar to inputLoop:

Find most frequent pair of adjacent symbolsReplace the pair by a new symbolAdd a new rule to the grammar G

Until no repeated pair exists

[Larsson 2000]

Page 5: Introduction to tree grammar compression

Represent tree by a small context free tree grammar which generates only the tree.

TreeRePair: Tree Grammar Compression [Lohrey 2011]

Initialize the grammar to inputLoop:

Find most frequent pair of adjacent nodesReplace the pair by a new symbolAdd a new rule to the grammar G

Until no repeated pair exists

Page 6: Introduction to tree grammar compression

Example 1 - Step 1

f

ac

ff

ac

ff

S -> f(a,f(c,f(a,f(c,f(a,f(c,f(a,f(c,b))))))))

ac

f

b

f

ac

f

Page 7: Introduction to tree grammar compression

Example 1 - Step 1

f

ac

ff

ac

ff

ac

ff

ac

f

bS -> f(a,f(c,f(a,f(c,f(a,f(c,f(a,f(c,b))))))))

Page 8: Introduction to tree grammar compression

Example 1 - Step 2

A

c

fA

c

fA

c

fA

f

bS -> A(f(c,A(f(c,A(f(c,A(f(c,b))))))))A(y) -> f(a,y)

c

Page 9: Introduction to tree grammar compression

S -> A(f(c,A(f(c,A(f(c,A(f(c,b))))))))A(y) -> f(a,y)

Example 1 - Step 2

A

c

fA

c

fA

c

fA

f

bc

Page 10: Introduction to tree grammar compression

S -> A(B(A(B(A(B(A(B(b))))))))A(y) -> f(a,y)B(y) -> f(c, y)

Example 1 - Step 3

AB

AB

AB

AB

b

Page 11: Introduction to tree grammar compression

S -> A(B(A(B(A(B(A(B(b))))))))A(y) -> f(a,y)B(y) -> f(c, y)

Example 1 - Step 3

AB

AB

AB

AB

b

Page 12: Introduction to tree grammar compression

S -> C(C(C(C(b))))A(y) -> f(a,y)B(y) -> f(c, y)C(y) -> A(B(y))

Example 1 - Step 4

C

C

C

C

b

Page 13: Introduction to tree grammar compression

S -> C(C(C(C(b))))A(y) -> f(a,y)B(y) -> f(c, y)C(y) -> A(B(y))

Example 1 - Step 4

C

C

C

C

b

Page 14: Introduction to tree grammar compression

S -> D(D(b))A(y) -> f(a,y)B(y) -> f(c, y)C(y) -> A(B(y))D(y) -> C(C(y))

Example 1 - Finished

D

b

D

Page 15: Introduction to tree grammar compression

Dataset: average on 24 XML documentsTime: pre-order traversal timeSize: memory consumptionOSS: https://code.google.com/p/treerepair/

Experiment

Method TreeRePair DAG Succinct Pointer

Time (ms) 771 3,220 164 56

Size (KB) 463 3,070 2,724 19,995

[Lohrey 2011]

Page 16: Introduction to tree grammar compression

DAG as Tree Grammar● Sharing subtree can compress trie● The result is DAG (directed acyclic graph)● Can be seen as a kind of tree grammar

ab c S -> a(b(D), c(D))

D -> d(e,f)

Symbols have no parametersd

fe

Page 17: Introduction to tree grammar compression

How to distinguish shared nodes?● Consider pre-order identifiers in original tree● Store size for each subgraph of rules● Sum size of skipped subtrees during search

Distinguish shared nodes

Page 18: Introduction to tree grammar compression

Example 2

S -> a(B,e(B)) |S| = 8B -> b(c,d) |B| = 3

a

b

bc d

c d

e

0

4

5

6

1

2 3

7

Page 19: Introduction to tree grammar compression

Example 3

S -> a(B(d,e(f)),g(B(h,i))) |S| = 11B(x,y) -> b(c(x,y)) |B(x,y)| = |x|+|y|+2

a

b

c

ed

b

c

ihf

g1

0

2

3 4

5

6

7

8

9 10

Page 20: Introduction to tree grammar compression

Open questions● When to share nodes, and when not to do?

○ Pruning / rank / overlap variations○ Smallest grammar is NP-hard [Charikar 2005]

● How to encode tree grammar efficiently?○ Compression and encoding are separate problems○ Extend succinct CFG [Tabei 2013] to tree?

● Can we use deterministic CFG for string?○ Construct DCFG which accepts words in lexicon○ Use shift-reduce parser to test membership○ Start symbols are unique word identifiers

Page 21: Introduction to tree grammar compression

(ノ゚Д)ノ== ┻━┻

Fin.

Page 22: Introduction to tree grammar compression

Reference● [Lohrey 2010] Tree structure compression

with RePair.● [Larsson 2000] Off-line dictionary-based

compression.● [Charikar 2005] The Smallest Grammar

Problem.● [Tabei 2013] A Succinct Grammar

Compression.●