Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller.

Post on 13-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Mining for Empty Rectangles in Large Data Sets

Jeff Edmonds

Jarek Gryz

Dongming Liang

Renee Miller

2

0 0 1 1 0 0 0 0 1

1 2 3 6 7 8

Matrix representation

A B 3 1 3

6 7 8

A,B(R S)

3

0 0 1 1 0 0 0 0 1

1 2 3 6 7 8

Find All Maximal 0-Rectangles

A,B(R S)

000

0 00

al

00

0

um

A B 3 1 3

6 7 8

4

0 0 1 1 0 0 0 0 1

95 96 97 BMW Z3 Honda L2 Toyota 6A

Example

A,B(R S)

0 0Car Year

First BMW Z3 series cars were made in 1997.

5

Relation to Previous Work

[Lui, Ku, Hsu] & [Orlowski] Our Work

Problem:

Purpose:• Machine Learning• Computational Geometry

• Query Optimization

• between points in real plane

• within a 0-1 matrix

Find all maximal empty rectangles

# of maximal 0-rectangles:• O( (# 1’s)2 ) • O( #0’s )

[Namaad, Hsu, Lee]

6

Relation to Previous WorkOur Work

Time:

Space:• O(|X||Y|) • O(min(|X|, |Y|))

• only two rows of matrix kept in memory

• O( # 1’s log(#1’s) + # rectangles ) = O(|X||Y|)

• O( #0’s ) = O(|X||Y|)

[Lui, Ku, Hsu] & [Orlowski][Namaad, Hsu, Lee]

7

Relation to Previous WorkOur Work

Practical Implementation:

Scalable:• Scales Badly • Scales well wrt

• # of tuples in join• # of maximal rectangles• # of values |X| & |Y|

• Intensive random memory access

Requires a single scan of the sorted data

Practical?• IBM paid us $25,000

to patent it!

[Lui, Ku, Hsu] & [Orlowski][Namaad, Hsu, Lee]

8

Structure of Algorithmloop y = 1..|Y|

loop x = 1..|X|• Output all maximal 0-rectangles

with <x,y> as bottom-right corner• Maintain the loop invariant

1

1

1

1

1

X

•0

Y

0

1

Timing

O(1) amortized time per <x,y>

<x,y> *

9

Designing an Algorithm Define Problem Define Loop

InvariantsDefine Measure of Progress

Define Step Define Exit Condition Maintain Loop Inv

Make Progress Initial Conditions Ending

km

79 km

to school

Exit

Exit

79 km 75 km

Exit

Exit

0 km Exit

10

1

1

1

1

1•00

1

XY

<x,y> *

Define the Loop Invariant• We have read the matrix up to <x,y>

and cannot reread the matrix.• We must output all maximal 0-rectangles

with <x,y> as bottom-right corner• What must we remember?

11

0

step

1

1

1

1

1 0

( x ,y )r r

( x ,y )1 1

( x ,y )2 2

( x ,y )3 3

( x ,y )4 4

( x ,y )5 5

Stack of steps 1

1

X

Y

<x,y> *1 0 0 0 0

10

00

0

0

x*

y*

12

1

1

1

1

1

1

1

X

Y

0

1 0 0 0 0

10

00

0

0

( x ,y )r r

( x ,y )1 1

( x , y )

0

<x,y> *

Constructing Maximal Rectangles

13

1

1

1

1

1

1

1

X

Y

0

1 0 0 0 0

10

00

0

0

( x ,y )r r

( x ,y )1 1

( x , y )

0

• Too Narrow • Maximal• Too short

<x,y> *

Constructing Maximal Rectangles

14<x-1,y> *

Constructing staircase(x,y)from staircase(x-1,y)

1

1

1

1

1

1

1

1 0 0 0 0

00

00

0

0

0

00

00

0

1

0

00

0

Case 1

<x,y> *

0

151

1

1

1

1

1

1

X

Y

0

1 0 0 0 0

1

0

00

0

0

( x ,y )r r

( x ,y )1 1

( x, y )

0<x-1,y> *

Constructing staircase(x,y)from staircase(x-1,y)

0

Case 2

161

1

1

1

1

1

1

X

Y

0

1 0 0 0 0

1

0

00

0

0

( x ,y )r r

( x ,y )1 1

( x, y )

0

• Too Narrow • Maximal• Too short

<x-1,y> *

Constructing staircase(x,y)from staircase(x-1,y)

00

Delete

Keep

<x,y> *

0

17

Constructing x* & y*

1

1

1

1

1

1

1

0

1 0 0 0 0

( x ,y )r r

( x ,y )1 1

( x, y )

0<x,y> *

00

00

0

0

01

0

x*

y*

18X

Y

<x,y>

10

0

00

00

0

100

00

0

01

0

1

00

0

00

00

0

0

01

000

00

0

0

100

00

0

0

10

01

0

0

10

00

0

0

10

0

00

00

0

0

100

00

0

0

01

000

00

0

0

10

Location of last 1 seen in each column

*

19

Structure of Algorithmloop y = 1..|Y|

loop x = 1..|X|• Construct staircase(x,y)• Output all maximal 0-rectangles

with <x,y> as bottom-right corner

1

1

1

1

1

X

•0

Y

<x.y>

0

1

Timing

O(1) amortized time per <x,y>

Third

<x,y> *

201

1

1

1

1

1

1

X

Y

0

1 0 0 0 0

1

0

00

0

0

( x ,y )r r

( x ,y )1 1

( x, y )

0

• Too Narrow • Maximal• Too short

<x,y> *

Timing

00

Delete

0

Only work that is not constant Time

21

TimingAmortized # of steps deleted (per <x,y>)

= # of steps created (per <x,y>) 1£

<x-1,y> *1

1

1

1

1

1

1

1 0 0 0 0

00

00

0

0

0

00

00

0

1

0

00

0

22

Number of Maximal Rectangles

# of maximal 0-rectangles:

• O( (# 1’s)2 ) [Namaad, Hsu, Lee]• Running time of alg = O( #0’s )

£

£

23

Designing an Algorithm Define Problem Define Loop

InvariantsDefine Measure of Progress

Define Step Define Exit Condition Maintain Loop Inv

Make Progress Initial Conditions Ending

km

79 km

to school

Exit

Exit

79 km 75 km

Exit

Exit

0 km Exit

top related