Top Banner
Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller
23

Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

Dec 13, 2015

Download

Documents

Antonia Dawson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

Mining for Empty Rectangles in Large Data Sets

Jeff Edmonds

Jarek Gryz

Dongming Liang

Renee Miller

Page 2: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

2

0 0 1 1 0 0 0 0 1

1 2 3 6 7 8

Matrix representation

A B 3 1 3

6 7 8

A,B(R S)

Page 3: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

3

0 0 1 1 0 0 0 0 1

1 2 3 6 7 8

Find All Maximal 0-Rectangles

A,B(R S)

000

0 00

al

00

0

um

A B 3 1 3

6 7 8

Page 4: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

4

0 0 1 1 0 0 0 0 1

95 96 97 BMW Z3 Honda L2 Toyota 6A

Example

A,B(R S)

0 0Car Year

First BMW Z3 series cars were made in 1997.

Page 5: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

5

Relation to Previous Work

[Lui, Ku, Hsu] & [Orlowski] Our Work

Problem:

Purpose:• Machine Learning• Computational Geometry

• Query Optimization

• between points in real plane

• within a 0-1 matrix

Find all maximal empty rectangles

# of maximal 0-rectangles:• O( (# 1’s)2 ) • O( #0’s )

[Namaad, Hsu, Lee]

Page 6: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

6

Relation to Previous WorkOur Work

Time:

Space:• O(|X||Y|) • O(min(|X|, |Y|))

• only two rows of matrix kept in memory

• O( # 1’s log(#1’s) + # rectangles ) = O(|X||Y|)

• O( #0’s ) = O(|X||Y|)

[Lui, Ku, Hsu] & [Orlowski][Namaad, Hsu, Lee]

Page 7: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

7

Relation to Previous WorkOur Work

Practical Implementation:

Scalable:• Scales Badly • Scales well wrt

• # of tuples in join• # of maximal rectangles• # of values |X| & |Y|

• Intensive random memory access

Requires a single scan of the sorted data

Practical?• IBM paid us $25,000

to patent it!

[Lui, Ku, Hsu] & [Orlowski][Namaad, Hsu, Lee]

Page 8: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

8

Structure of Algorithmloop y = 1..|Y|

loop x = 1..|X|• Output all maximal 0-rectangles

with <x,y> as bottom-right corner• Maintain the loop invariant

1

1

1

1

1

X

•0

Y

0

1

Timing

O(1) amortized time per <x,y>

<x,y> *

Page 9: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

9

Designing an Algorithm Define Problem Define Loop

InvariantsDefine Measure of Progress

Define Step Define Exit Condition Maintain Loop Inv

Make Progress Initial Conditions Ending

km

79 km

to school

Exit

Exit

79 km 75 km

Exit

Exit

0 km Exit

Page 10: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

10

1

1

1

1

1•00

1

XY

<x,y> *

Define the Loop Invariant• We have read the matrix up to <x,y>

and cannot reread the matrix.• We must output all maximal 0-rectangles

with <x,y> as bottom-right corner• What must we remember?

Page 11: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

11

0

step

1

1

1

1

1 0

( x ,y )r r

( x ,y )1 1

( x ,y )2 2

( x ,y )3 3

( x ,y )4 4

( x ,y )5 5

Stack of steps 1

1

X

Y

<x,y> *1 0 0 0 0

10

00

0

0

x*

y*

Page 12: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

12

1

1

1

1

1

1

1

X

Y

0

1 0 0 0 0

10

00

0

0

( x ,y )r r

( x ,y )1 1

( x , y )

0

<x,y> *

Constructing Maximal Rectangles

Page 13: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

13

1

1

1

1

1

1

1

X

Y

0

1 0 0 0 0

10

00

0

0

( x ,y )r r

( x ,y )1 1

( x , y )

0

• Too Narrow • Maximal• Too short

<x,y> *

Constructing Maximal Rectangles

Page 14: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

14<x-1,y> *

Constructing staircase(x,y)from staircase(x-1,y)

1

1

1

1

1

1

1

1 0 0 0 0

00

00

0

0

0

00

00

0

1

0

00

0

Case 1

<x,y> *

0

Page 15: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

151

1

1

1

1

1

1

X

Y

0

1 0 0 0 0

1

0

00

0

0

( x ,y )r r

( x ,y )1 1

( x, y )

0<x-1,y> *

Constructing staircase(x,y)from staircase(x-1,y)

0

Case 2

Page 16: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

161

1

1

1

1

1

1

X

Y

0

1 0 0 0 0

1

0

00

0

0

( x ,y )r r

( x ,y )1 1

( x, y )

0

• Too Narrow • Maximal• Too short

<x-1,y> *

Constructing staircase(x,y)from staircase(x-1,y)

00

Delete

Keep

<x,y> *

0

Page 17: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

17

Constructing x* & y*

1

1

1

1

1

1

1

0

1 0 0 0 0

( x ,y )r r

( x ,y )1 1

( x, y )

0<x,y> *

00

00

0

0

01

0

x*

y*

Page 18: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

18X

Y

<x,y>

10

0

00

00

0

100

00

0

01

0

1

00

0

00

00

0

0

01

000

00

0

0

100

00

0

0

10

01

0

0

10

00

0

0

10

0

00

00

0

0

100

00

0

0

01

000

00

0

0

10

Location of last 1 seen in each column

*

Page 19: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

19

Structure of Algorithmloop y = 1..|Y|

loop x = 1..|X|• Construct staircase(x,y)• Output all maximal 0-rectangles

with <x,y> as bottom-right corner

1

1

1

1

1

X

•0

Y

<x.y>

0

1

Timing

O(1) amortized time per <x,y>

Third

<x,y> *

Page 20: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

201

1

1

1

1

1

1

X

Y

0

1 0 0 0 0

1

0

00

0

0

( x ,y )r r

( x ,y )1 1

( x, y )

0

• Too Narrow • Maximal• Too short

<x,y> *

Timing

00

Delete

0

Only work that is not constant Time

Page 21: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

21

TimingAmortized # of steps deleted (per <x,y>)

= # of steps created (per <x,y>) 1£

<x-1,y> *1

1

1

1

1

1

1

1 0 0 0 0

00

00

0

0

0

00

00

0

1

0

00

0

Page 22: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

22

Number of Maximal Rectangles

# of maximal 0-rectangles:

• O( (# 1’s)2 ) [Namaad, Hsu, Lee]• Running time of alg = O( #0’s )

£

£

Page 23: Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

23

Designing an Algorithm Define Problem Define Loop

InvariantsDefine Measure of Progress

Define Step Define Exit Condition Maintain Loop Inv

Make Progress Initial Conditions Ending

km

79 km

to school

Exit

Exit

79 km 75 km

Exit

Exit

0 km Exit