Top Banner
Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar 05361
27

Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Dec 16, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Trading off space for passes in graph streaming problems

Camil DemetrescuIrene FinocchiAndrea Ribichini

University of Rome “La Sapienza”

Dagstuhl Seminar 05361

Page 2: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Processing massive data streams

Large body of work in recent years

Practically motivated, raises interesting theoretical questions

Areas:Databases, Sensors, Networking, Hardware, Programming lang.

Core problems: Algorithms, Complexity, Statistics, Probability, Approximation theory

Page 3: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Classical streaming

inputstream

M1stpass

M M MM2ndpass

M M M

p = number of passess = size of working memory M (space in bits)

n = size of input stream (# of items)

Page 4: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Classical streaming

Seminal work by Munro and Paterson (1980): pass-efficient selection and sorting

Several problems shown to be solvable with polylog(n) space and passes in the 90’s (e.g., approximating frequency moments)

Classical streaming is very restrictive: for many fundamental problems (e.g., on graphs)provably impossible to achieve polylog(n) space and passes

Page 5: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Graph streaming problems

For many basic graph problems(e.g., connectivity, shortest paths):

passes = Ω (N/space)( N = number of vertices )

Recent interest in graph problems in “semi-streaming” models, where:

space = O( N · polylog(N) )passes = O( polylog(N) )

[Feigenbaum et al., ICALP 2004]

O(N · polylog(N)) space “sweet spot” for graph streaming problems [Muthukrishnan, 2001]

Page 6: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Graph algorithms in classical streaming

Approximate triangle counting[Bar-Yossef et al., SODA 2002]

Matching, bipartiteness, connectivity, MST, t-spanners, …[Feigenbaum et al., ICALP 2004, SODA 2005]

All of them make one, or very few passes, but require Ω(N) space

Page 7: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Trading off space for passes

Natural question:Can we reduce space if we do more passes?

[Munro and Paterson ‘80, Henzinger et al. ‘99]

Example:

Processing a 50 GB graph on a 1 GB RAM PC(4 billion vertices, 6 billion edges)

s = (N/p) algorithm: ~16 passes (a few hours)

s = (N) algorithm: out of memory (16 GB RAM would be required)

Page 8: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Some facts on modern commodity I/O

A RAID disk controller can deliver 100 MB/s access rateOn a 1+ GHz Pentium PC, random access to 2GB of main memory in 32 byte chunks: 80 MB/s effective access rate

Sequential access rates are comparable to (or even faster than) random access rates in main memory:

Sequential access uses caches optimally(this makes algorithms cache-oblivious)

[Ruhl ‘03 - Rajagopalan ‘02]

Page 9: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Some facts on modern commodity I/O

Classical read-only streaming perhaps overly pessimistic?

Why not exploiting temporary storage?

Above facts imply that both reading and writing sequentially can improve performances

External memory storage is cheap (less than a dollar per gigabyte) and readily available

Page 10: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

interm.stream

M1stpass

The StreamSort model [Aggarwal et al.’04]

inputstream

M M M M M M M

outputstream

2ndpass M M M M M M M M

use a sorting primitiveto reorder the stream

Page 11: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

How much power does sorting yield?

Open problem:No clue on how to get polylog(N) bounds for Shortest Paths (even BFS) in StreamSort

Good news:Undirected connectivity can be solved in polylog(N) space and passesin StreamSort

[Aggarwal et al., FOCS 2004]

Page 12: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Dish of the day

In this model, we show effective space/passes tradeoffs for natural graph streaming problems

- Connectivity - Single-source shortest paths

We address:

We show that StreamSort can yield interestingresults even without using sorting at all

(call this more restrictive model W-Stream: allows intermediate streams, but no sorting)

Page 13: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Graph connectivity

UCON: G=(V,E) undirected graph with N vertices given as stream of edges in arbitrary order. Find out if G is connected.

Lower bound: UCON in W-Stream p = Ω(N/s)

Upper bound: UCON in W-Stream p = O(N · log N / s)

We now show the following:

Page 14: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Input stream Output stream

G G’

passF

Graph connectivity: algorithm

1 2

3 7 5

811

12

11

12

8

5

9

610

1

9104

Generic pass: two phasesRed phaseBlue phase

Page 15: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Graph connectivity: analysis

How many passes?

At each pass we loose at least |V(F)| / 2 = (s/log N) vertices

Invariant: F is induced by a set of edges each tree in F contains at least two vertices

p = O( N ·log N / s)

All vertices of F that are not component representatives disappear from the output graph

Page 16: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Single-source shortest paths

SSSP: G=(V,E,w) weighted directed graph with N vertices given as arbitrary stream of edges. Find distances from a given source t to all other vertices.

Lower bound 1: BFS in W-Stream: p = Ω(N / s)

Lower bound 2: finding vertices up to constant distance d: p ≤ d s = Ω( N1+1/(2d) ) [Feigenbaum et al., SODA 2005]

Space-efficient algorithms for SSSPalways require multiple passes

Page 17: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Single-source shortest paths

Hard even using sorting as a primitive

No sublinear-space streaming algorithm for SSSP previously known.

We make a first step, showing that we can solve SSSP in W-Stream in sublinear space and passes simultaneously in directed graphs with small integer edge weights

Previous results on distances in streaming: approximate (spanners) in undirected graphs only

Page 18: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Single-source shortest paths: bound

For C = O(s1/2-) and polynomial sublinear space, we also get sublinear p

Thm: For any space restriction s, there is a randomized one-sided error algorithm for directed SSSP in W-Stream with edge weights in 1,2,…,C s.t.:

p = O C ·N ·log3/2 N

√s

In this talk we focus on C=1 (BFS)

p = O N√s

~p = Ω

Ns

Page 19: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Single-source shortest paths: approach

For a given space restriction, this helps us reduce the number of passes to find long paths

Overall approach: First build many short paths “in parallel”, then stitch them together to form long paths.

Page 20: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Single-source shortest paths: step 1/5

Pick a set K of (s/log N)1/2 random vertices including source t

1 6 10 5 8 3 7 2 4 9t

Example: (chain)

Page 21: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

1 1 1 12 223 3 3

Single-source shortest paths: step 2/5

Find distances up to (N log N) / |K| from each vertex in K (short distances)

1 6 10 5 8 3 7 2 4 9t

Example: (chain)

N log N|K|

0 0 0 0

The more memory we have,the larger |K|, and thus the smaller the # of passes

Page 22: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Single-source shortest paths: step 3/5

Build a graph G’ = (K, E’), where: (x,y) E’ dist(x,y) ≤ (N log N) / |K| in G

1 6 10 5 8 3 7 2 4 9t

Example: (chain)

1 5 7 4t3 3 2

G’

1 1 1 12 223 3 3

0 0 0 0

Page 23: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

0 3 6 8

Single-source shortest paths: step 4/5

Find in G’ distancesfrom t to all other vertices of K

1 6 10 5 8 3 7 2 4 9t

Example: (chain)

1 5 7 4t3 3 2

G’

0 3 6 8

Page 24: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Single-source shortest paths: step 5/5

For each v, let: dist(t,v) = min c K dist(t,c) + dist(c,v)

(final distances)

1 6 10 5 8 3 7 2 4 9t

Example: (chain)

0 3 6 8

1 1 1 12 223 3 3

0 0 0 0

1 2 4 5 7 9

Page 25: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Results are correct with high prob.

[Greene & Knuth,’80]

Sampling thm. Let K be a set of vertices chosen uniformly at random. Then the probability that a simple path with more than (c ·N · log N) / |K| vertices intersects K is at least 1-1/nc for any c > 0

Page 26: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Conclusions and further work

We have shown effective space/passes tradeoffs for problems that seem hard in classical streaming (graph connectivity & shortest paths)

Can we close the gap between upper and lower bound for BFS in W-Stream?

Can we do the same in the classical read-only streaming model?

Can we prove stronger lower bounds in classicalstreaming?

Space/passes tradeoffs for other problems?

Page 27: Trading off space for passes in graph streaming problems Camil Demetrescu Irene Finocchi Andrea Ribichini University of Rome “La Sapienza” Dagstuhl Seminar.

Thank you