Top Banner
Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira
23

Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

Dec 16, 2015

Download

Documents

Laurel Rose
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

Modeling Delta Encoding of Compressed Files

S.T. Klein, T.C. Serebro, D. Shapira

Page 2: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

Delta Encoding

Example:

S=The Prague Stringology ClubT=The Prague Stringology Conference 06

Δ=(1, 24)onferenc(3,2)06

Page 3: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

Compressed Differencing

Goal- Create a delta file of S and T, without decompressing the compressed files.

S T

Δ(S,T)

E(S)Delta encoding:Semi Compressed Differencing:

E(T)SE(S)Full Compressed Differencing:

Page 4: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

LZW compressionSTR = input character WHILE there are input characters {C = input character IF STR C is in T then

STR = STR C ELSE {

output the code for STR add STR C to T STR = C

}} output the code for STR

Page 5: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

S =abccbaaabccba

Example

E(S) =1233219571

Page 6: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

construct the trie of E(S)i 1while i ≤ u{ P Starting at the root,

traverse the trie using P When a leaf v is reached k depth of v in trie output the position in S

corresponding to v ii+ k}

uii TTT ...1

Semi Compressed Differencing Algorithm

Page 7: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

E(S) =1233219571, T =ccbbabccbabccbba.

(3,2) b (5,2) (9,3)(5,2)(9,3) b (5,2)

Example

Δ(S,T)=

Page 8: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

Full Compressed Differencing Algorithm1 construct the trie of E(S)2 flag 0 // output character k3 counter 1 // position in T4 input oldcw from E(T)5 while oldcwNULL // still processing E(T) {5.1 input cw from E(T)5.2 node Dictionary[oldcw]5.3 if (Dictionary[cw] NULL)5.3.1 k first character of string corresponding to Dictionary[cw]5.4 else5.4.1 k first character of string corresponding to node5.5 if ((node has a child k) and (cwNULL))5.5.1 output (pos+flag,len-flag) corresponding to child k of node5.5.2 flag 15.6 else5.6.1 output (pos+flag, len-flag) corresponding to node5.6.2 create a new child of node corresponding to k5.6.3 flag 05.7 pos of child k of node counter5.8 oldcw cw5.9 counter counter + len - flag }

Page 9: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

E(S) =1233219571 E(T) =33221247957

Example

Page 10: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

ExampleE(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=coldcw=3

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccoldcw=3cw=3

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccoldcw=3cw=3k=c

3

Page 11: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

Example

4(1,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccoldcw=3cw=3k=cΔ(S,T)=<3, 2>

3

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccoldcw=3cw=3k=c

Page 12: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

Example

4(1,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccoldcw=3cw=3 flag=1k=cΔ(S,T)=<3, 2>

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccboldcw=3cw=3 flag=1k=c

Page 13: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

Example

4(1,2,c)

Δ(S,T)=<3, 2>

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccboldcw=3cw=2 flag=1k=c

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccboldcw=3cw=2 flag=1k=b

<5, 1>

5(2,2,c)

Page 14: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

Example

4(1,2,c)

Δ(S,T)=<3, 2>

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccboldcw=3cw=2 flag=1k=b

<5, 1>

5(2,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbboldcw=3cw=2 flag=1k=b

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbboldcw=2cw=2 flag=1k=b

6(3,2,b)

b

<b, 0>

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbboldcw=2cw=2 flag=0k=b

Page 15: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

Example

4(1,2,c)

5(2,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbboldcw=2cw=2 flag=0k=b

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbaoldcw=2cw=2 flag=0k=b

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbaoldcw=2cw=1 flag=0k=a

6(3,2,b)

Δ(S,T)=<3, 2> <5, 1>

4(1,2,c)

5(2,2,c)

7(4,2,b)

<5, 2>

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbaoldcw=2cw=1 flag=1k=a

b

Page 16: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

Example

4(1,2,c)

Δ(S,T)=<3, 2> <5, 1>

5(2,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbaoldcw=2cw=1 flag=1k=a

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbaboldcw=2cw=1 flag=1k=a

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbaboldcw=1cw=2 flag=1k=b

6(3,2,b)

4(1,2,c)

5(2,2,c)

7(4,2,b)

<5, 2>

8(5,2,a)

<2,1>

b

Page 17: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

Example

4(1,2,c)

Δ(S,T)=<3, 2> <5, 1>

5(2,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccoldcw=2cw=4 flag=1k=c

6(3,2,b)

b

4(1,2,c)

5(2,2,c)

7(4,2,b)

<5, 2>

8(5,2,a)

<2,1>

9(6,2,b)

<3, 1>

Page 18: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

Example

4(1,2,c)

Δ(S,T)=<3, 2> <5, 1>

5(2,2,c)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbaoldcw=4cw=7 flag=1k=b

6(3,2,b)

7(4,2,b)

<5, 2>

8(5,2,a)

<2,1>

9(6,2,b)

<3, 1>

10(7,3,c)

b

(2, 1)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbaoldcw=4cw=7 flag=0k=b

b

Page 19: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

Example

4(1,2,c)

Δ(S,T)=<3, 2> <5, 1>

5(2,2,c)

6(3,2,b)

b

7(4,2,b)

<5, 2>

8(5,2,a)

<2,1>

9(6,2,b)

<3, 1>

10(7,3,c)

b

(2, 1)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbabcoldcw=7cw=9 flag=0k=b

11(9,3,b)

b

(4, 2)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbabccboldcw=9cw=5 flag=0k=c

<9, 3>

12(11,3,b)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbabccboldcw=9cw=5 flag=1k=c

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbabccbbaoldcw=5cw=7 flag=1k=b

13(13,3,c)

b

(3, 1)

E(S) =1233219571 E(T) =33221247957S =abccbaaabccba T=ccbbabccbabccbbaoldcw=7cw=Null flag=0k=b

(4, 2)

Page 20: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

Combination of Pairs

Δ(S,T)=<3, 2> <5, 1> <5, 2><2,1> <3, 1> (2, 1)(4, 2)<9, 3> (3, 1)(4, 2)

S =abccbaaabccbaS =abccbaaabccba

<3, 2> <5, 1>

S =abccbaaabccba

<3, 3>

If two consecutive ordered pairs are of the form and , we combine them into a single ordered pair

1, li 21, lli 21, lli

Page 21: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

Combination of Pairs

If two consecutive ordered pairs are of the form and , we combine them into a single ordered pair

1, li 21, lli 21, lli

Δ(S,T)= <5, 2><2,1> <3, 1> (2, 1)(4, 2)<9, 3> (3, 1)(4, 2)

S =abccbaaabccbaS =abccbaaabccbaS =abccbaaabccba

<3, 3> <2,1><3, 1><2, 2>

Δ(S,T)= <5, 2> (4, 2) <9, 3> (4, 2)<3, 3> <2,2 > c b

Page 22: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

Encoding the delta fileΔ(S,T)= <5, 2> (4, 2) <9, 3> (4, 2)<3, 3> <2,2 > c b

File consists of:

(pos, len) in S

(pos, len) in T

Characters

flags

Page 23: Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira.

Experiments:

S = xfig.3.2.1 T = xfig.3.2.2

|T| = 812K|Gzip(T)| = 325K|LZW(T)| = 497K|Δ(S,T)| 3K