Top Banner
All-Pairs Shortest Paths - Floyd’s Algorithm Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior T´ ecnico November 6, 2012 CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 1 / 25
67

All-Pairs Shortest Paths - Floyd's Algorithm

Oct 04, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: All-Pairs Shortest Paths - Floyd's Algorithm

All-Pairs Shortest Paths - Floyd’s Algorithm

Parallel and Distributed Computing

Department of Computer Science and Engineering (DEI)Instituto Superior Tecnico

November 6, 2012

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 1 / 25

Page 2: All-Pairs Shortest Paths - Floyd's Algorithm

Outline

All-Pairs Shortest Paths, Floyd’s Algorithm

Partitioning

Input / Output

Implementation and Analysis

Benchmarking

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 2 / 25

Page 3: All-Pairs Shortest Paths - Floyd's Algorithm

Shortest Paths

All Pairs Shortest Paths

Given a weighted, directed graph G (V ,E ), determine the shortest pathbetween any two nodes in the graph.

0 1

2 3

3

0

1

8

4

6

-3

7-5

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 3 / 25

Page 4: All-Pairs Shortest Paths - Floyd's Algorithm

Shortest Paths

All Pairs Shortest Paths

Given a weighted, directed graph G (V ,E ), determine the shortest pathbetween any two nodes in the graph.

0 1

2 3

-2

0

9

8

4

6

-3

7-5

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 3 / 25

Page 5: All-Pairs Shortest Paths - Floyd's Algorithm

Shortest Paths

All Pairs Shortest Paths

Given a weighted, directed graph G (V ,E ), determine the shortest pathbetween any two nodes in the graph.

0 1

2 3

-2

0

9

8

4

6

-3

7-5

0 −2 −5 4∞ 0 9 ∞7 ∞ 0 −38 0 6 0

Adjacency Matrix

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 3 / 25

Page 6: All-Pairs Shortest Paths - Floyd's Algorithm

The Floyd-Warshall Algorithm

Recursive solution based on intermediate vertices.

Let pij be the minimum-weight path from node i to node j among pathsthat use a subset of intermediate vertices {0, . . . , k − 1}.

Consider an additional node k :

k 6∈ pij

then pij is shortest path considering the subset of intermediatevertices {0, . . . , k}.

k ∈ pij

then we can decompose pij as ipik k

pkj j , where subpaths pik and pkj

have intermediate vertices in the set {0, . . . , k − 1}.

d(k)ij =

{wij if k = −1

min(d

(k−1)ij , d

(k−1)ik + d

(k−1)kj

)if k ≥ 0

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 4 / 25

Page 7: All-Pairs Shortest Paths - Floyd's Algorithm

The Floyd-Warshall Algorithm

Recursive solution based on intermediate vertices.

Let pij be the minimum-weight path from node i to node j among pathsthat use a subset of intermediate vertices {0, . . . , k − 1}.Consider an additional node k :

k 6∈ pij

then pij is shortest path considering the subset of intermediatevertices {0, . . . , k}.

k ∈ pij

then we can decompose pij as ipik k

pkj j , where subpaths pik and pkj

have intermediate vertices in the set {0, . . . , k − 1}.

d(k)ij =

{wij if k = −1

min(d

(k−1)ij , d

(k−1)ik + d

(k−1)kj

)if k ≥ 0

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 4 / 25

Page 8: All-Pairs Shortest Paths - Floyd's Algorithm

The Floyd-Warshall Algorithm

Recursive solution based on intermediate vertices.

Let pij be the minimum-weight path from node i to node j among pathsthat use a subset of intermediate vertices {0, . . . , k − 1}.Consider an additional node k :

k 6∈ pij

then pij is shortest path considering the subset of intermediatevertices {0, . . . , k}.

k ∈ pij

then we can decompose pij as ipik k

pkj j , where subpaths pik and pkj

have intermediate vertices in the set {0, . . . , k − 1}.

d(k)ij =

{wij if k = −1

min(d

(k−1)ij , d

(k−1)ik + d

(k−1)kj

)if k ≥ 0

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 4 / 25

Page 9: All-Pairs Shortest Paths - Floyd's Algorithm

The Floyd-Warshall Algorithm

1. for k ← 0 to |V | − 1

2. for i ← 0 to |V | − 1

3. for j ← 0 to |V | − 1

4. d [i , j ]← min(d [i , j ], d [i , k] + d [k, j ])

Θ(|V |3)

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 5 / 25

Page 10: All-Pairs Shortest Paths - Floyd's Algorithm

The Floyd-Warshall Algorithm

1. for k ← 0 to |V | − 1

2. for i ← 0 to |V | − 1

3. for j ← 0 to |V | − 1

4. d [i , j ]← min(d [i , j ], d [i , k] + d [k, j ])

Complexity? Θ(|V |3)

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 5 / 25

Page 11: All-Pairs Shortest Paths - Floyd's Algorithm

The Floyd-Warshall Algorithm

1. for k ← 0 to |V | − 1

2. for i ← 0 to |V | − 1

3. for j ← 0 to |V | − 1

4. d [i , j ]← min(d [i , j ], d [i , k] + d [k, j ])

Complexity: Θ(|V |3)

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 5 / 25

Page 12: All-Pairs Shortest Paths - Floyd's Algorithm

Partitioning

Partitioning:

Domain decomposition: divide adjacency matrix into its |V |2 elements(computation in the inner loop is primitive task).

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 6 / 25

Page 13: All-Pairs Shortest Paths - Floyd's Algorithm

Partitioning

Partitioning:

Domain decomposition: divide adjacency matrix into its |V |2 elements(computation in the inner loop is primitive task).

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 6 / 25

Page 14: All-Pairs Shortest Paths - Floyd's Algorithm

Communication

Communication:

Let k = 1. Row sweep, i = 2.

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 7 / 25

Page 15: All-Pairs Shortest Paths - Floyd's Algorithm

Communication

Communication:

Let k = 1. Row sweep, i = 2.

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 7 / 25

Page 16: All-Pairs Shortest Paths - Floyd's Algorithm

Communication

Communication:

Let k = 1. Row sweep, i = 2.

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 7 / 25

Page 17: All-Pairs Shortest Paths - Floyd's Algorithm

Communication

Communication:

Let k = 1. Row sweep, i = 2.

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 7 / 25

Page 18: All-Pairs Shortest Paths - Floyd's Algorithm

Communication

Communication:

Let k = 1. Row sweep, i = 2.

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 7 / 25

Page 19: All-Pairs Shortest Paths - Floyd's Algorithm

Communication

Communication:

Let k = 1. Column sweep, j = 3.

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 7 / 25

Page 20: All-Pairs Shortest Paths - Floyd's Algorithm

Communication

Communication:

Let k = 1. Column sweep, j = 3.

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 7 / 25

Page 21: All-Pairs Shortest Paths - Floyd's Algorithm

Communication

Communication:

Let k = 1. Column sweep, j = 3.

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 7 / 25

Page 22: All-Pairs Shortest Paths - Floyd's Algorithm

Communication

Communication:

Let k = 1. Column sweep, j = 3.

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 7 / 25

Page 23: All-Pairs Shortest Paths - Floyd's Algorithm

Communication

Communication:

In iteration k, every task in row/column k broadcasts its value within taskrow/column.

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 7 / 25

Page 24: All-Pairs Shortest Paths - Floyd's Algorithm

Agglomeration and Mapping

Agglomeration and Mapping:

create one task per MPI process

agglomerate tasks to minimize communication

Possible decompositions: row-wise vs column-wise block striped (n = 11, p = 3).

Relative merit?

Column-wise block striped

Broadcast within columns eliminated

Row-wise block striped

Broadcast within rows eliminatedReading, writing and printing matrix simpler

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 8 / 25

Page 25: All-Pairs Shortest Paths - Floyd's Algorithm

Agglomeration and Mapping

Agglomeration and Mapping:

create one task per MPI process

agglomerate tasks to minimize communication

Possible decompositions: row-wise vs column-wise block striped (n = 11, p = 3).

Relative merit?

Column-wise block striped

Broadcast within columns eliminated

Row-wise block striped

Broadcast within rows eliminatedReading, writing and printing matrix simpler

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 8 / 25

Page 26: All-Pairs Shortest Paths - Floyd's Algorithm

Agglomeration and Mapping

Agglomeration and Mapping:

create one task per MPI process

agglomerate tasks to minimize communication

Possible decompositions: row-wise vs column-wise block striped (n = 11, p = 3).

Relative merit?

Column-wise block striped

Broadcast within columns eliminated

Row-wise block striped

Broadcast within rows eliminatedReading, writing and printing matrix simpler

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 8 / 25

Page 27: All-Pairs Shortest Paths - Floyd's Algorithm

Comparing Decompositions

Choose row-wise block striped decomposition.

Some tasks get⌈np

⌉rows, other get

⌊np

⌋.

Which task gets which size?

Distributed approach: distribute larger blocks evenly.

First element of task i :⌊i np

⌋Last element of task i :

⌊(i + 1)np

⌋− 1

Task owner of element j : b(p(j + 1)− 1) /nc

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 9 / 25

Page 28: All-Pairs Shortest Paths - Floyd's Algorithm

Comparing Decompositions

Choose row-wise block striped decomposition.

Some tasks get⌈np

⌉rows, other get

⌊np

⌋.

Which task gets which size?

Distributed approach: distribute larger blocks evenly.

First element of task i :⌊i np

⌋Last element of task i :

⌊(i + 1)np

⌋− 1

Task owner of element j : b(p(j + 1)− 1) /nc

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 9 / 25

Page 29: All-Pairs Shortest Paths - Floyd's Algorithm

Dynamic Matrix Allocation

Array allocation:Stack

A

Heap

Matrix allocation:

Stack

_M

Heap

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 10 / 25

Page 30: All-Pairs Shortest Paths - Floyd's Algorithm

Dynamic Matrix Allocation

Array allocation:Stack

A

Heap

Matrix allocation:Stack

_M

Heap

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 10 / 25

Page 31: All-Pairs Shortest Paths - Floyd's Algorithm

Dynamic Matrix Allocation

Array allocation:Stack

A

Heap

Matrix allocation:Stack

_M

Heap

M

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 10 / 25

Page 32: All-Pairs Shortest Paths - Floyd's Algorithm

Reading the Graph Matrix

File

0 1 2

Why don’t we read the whole file and then execute a MPI Scatter?

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 11 / 25

Page 33: All-Pairs Shortest Paths - Floyd's Algorithm

Reading the Graph Matrix

File

0 1 2

Why don’t we read the whole file and then execute a MPI Scatter?

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 11 / 25

Page 34: All-Pairs Shortest Paths - Floyd's Algorithm

Reading the Graph Matrix

File

0 1 2

Why don’t we read the whole file and then execute a MPI Scatter?

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 11 / 25

Page 35: All-Pairs Shortest Paths - Floyd's Algorithm

Reading the Graph Matrix

File

0 1 2

Why don’t we read the whole file and then execute a MPI Scatter?

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 11 / 25

Page 36: All-Pairs Shortest Paths - Floyd's Algorithm

Reading the Graph Matrix

File

0 1 2

Why don’t we read the whole file and then execute a MPI Scatter?

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 11 / 25

Page 37: All-Pairs Shortest Paths - Floyd's Algorithm

Reading the Graph Matrix

File

0 1 2

Why don’t we read the whole file and then execute a MPI Scatter?

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 11 / 25

Page 38: All-Pairs Shortest Paths - Floyd's Algorithm

Reading the Graph Matrix

File

0 1 2

Why don’t we read the whole file and then execute a MPI Scatter?

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 11 / 25

Page 39: All-Pairs Shortest Paths - Floyd's Algorithm

Reading the Graph Matrix

File

0 1 2

Why don’t we read the whole file and then execute a MPI Scatter?

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 11 / 25

Page 40: All-Pairs Shortest Paths - Floyd's Algorithm

Point-to-point Communication

involves a pair of processes

one process sends a messageother process receives the message

Task h Task i Task j

Compute

Send to j

Receive from i

Wait

Compute

Compute

Compute

Compute

Tim

e

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 12 / 25

Page 41: All-Pairs Shortest Paths - Floyd's Algorithm

MPI Send

int MPI_Send (

void *message,

int count,

MPI_Datatype datatype,

int dest,

int tag,

MPI_Comm comm

)

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 13 / 25

Page 42: All-Pairs Shortest Paths - Floyd's Algorithm

MPI Recv

int MPI_Recv (

void *message,

int count,

MPI_Datatype datatype,

int source,

int tag,

MPI_Comm comm,

MPI_Status *status

)

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 14 / 25

Page 43: All-Pairs Shortest Paths - Floyd's Algorithm

Coding Send / Receive

...

if (id == j) {

...

Receive from i

...

}

...

if (id == i) {

...

Send to j

...

}

...

Receive is before Send! Why does this work?

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 15 / 25

Page 44: All-Pairs Shortest Paths - Floyd's Algorithm

Coding Send / Receive

...

if (id == j) {

...

Receive from i

...

}

...

if (id == i) {

...

Send to j

...

}

...

Receive is before Send! Why does this work?

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 15 / 25

Page 45: All-Pairs Shortest Paths - Floyd's Algorithm

Internals of Send and Receive

Sending Process

MPI_Send

ProgramMemory

SystemBuffer

Receiving Process

MPI_Recv

ProgramMemory

SystemBuffer

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 16 / 25

Page 46: All-Pairs Shortest Paths - Floyd's Algorithm

Internals of Send and Receive

Sending Process

MPI_Send

ProgramMemory

SystemBuffer

Receiving Process

MPI_Recv

ProgramMemory

SystemBuffer

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 16 / 25

Page 47: All-Pairs Shortest Paths - Floyd's Algorithm

Internals of Send and Receive

Sending Process

MPI_Send

ProgramMemory

SystemBuffer

Receiving Process

MPI_Recv

ProgramMemory

SystemBuffer

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 16 / 25

Page 48: All-Pairs Shortest Paths - Floyd's Algorithm

Internals of Send and Receive

Sending Process

MPI_Send

ProgramMemory

SystemBuffer

Receiving Process

MPI_Recv

ProgramMemory

SystemBuffer

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 16 / 25

Page 49: All-Pairs Shortest Paths - Floyd's Algorithm

Return from MPI Send

function blocks until message buffer free

message buffer is free when

message copied to system buffer, ormessage transmitted

typical scenario

message copied to system buffertransmission overlaps computation

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 17 / 25

Page 50: All-Pairs Shortest Paths - Floyd's Algorithm

Return from MPI Send

function blocks until message buffer free

message buffer is free when

message copied to system buffer, ormessage transmitted

typical scenario

message copied to system buffertransmission overlaps computation

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 17 / 25

Page 51: All-Pairs Shortest Paths - Floyd's Algorithm

Return from MPI Send

function blocks until message buffer free

message buffer is free when

message copied to system buffer, ormessage transmitted

typical scenario

message copied to system buffertransmission overlaps computation

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 17 / 25

Page 52: All-Pairs Shortest Paths - Floyd's Algorithm

Return from MPI Recv

function blocks until message in buffer

if message never arrives, function never returns!

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 18 / 25

Page 53: All-Pairs Shortest Paths - Floyd's Algorithm

Return from MPI Recv

function blocks until message in buffer

if message never arrives, function never returns!

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 18 / 25

Page 54: All-Pairs Shortest Paths - Floyd's Algorithm

Deadlock

Deadlock

Process waiting for a condition that will never become true.

Easy to write send/receive code that deadlocks:

two processes: both receive before send

send tag doesn’t match receive tag

process sends message to wrong destination process

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 19 / 25

Page 55: All-Pairs Shortest Paths - Floyd's Algorithm

Deadlock

Deadlock

Process waiting for a condition that will never become true.

Easy to write send/receive code that deadlocks:

two processes: both receive before send

send tag doesn’t match receive tag

process sends message to wrong destination process

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 19 / 25

Page 56: All-Pairs Shortest Paths - Floyd's Algorithm

Deadlock

Deadlock

Process waiting for a condition that will never become true.

Easy to write send/receive code that deadlocks:

two processes: both receive before send

send tag doesn’t match receive tag

process sends message to wrong destination process

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 19 / 25

Page 57: All-Pairs Shortest Paths - Floyd's Algorithm

C Code

void compute_shortest_paths (int id, int p, double **a, int n)

{

int i, j, k;

int offset; /* Local index of broadcast row */

int root; /* Process controlling row to be bcast */

double* tmp; /* Holds the broadcast row */

tmp = (double *) malloc (n * sizeof(double));

for (k = 0; k < n; k++) {

root = BLOCK_OWNER(k,p,n);

if (root == id) {

offset = k - BLOCK_LOW(id,p,n);

for (j = 0; j < n; j++)

tmp[j] = a[offset][j];

}

MPI_Bcast (tmp, n, MPI_DOUBLE, root, MPI_COMM_WORLD);

for (i = 0; i < BLOCK_SIZE(id,p,n); i++)

for (j = 0; j < n; j++)

a[i][j] = MIN(a[i][j],a[i][k]+tmp[j]);

}

free (tmp);

}

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 20 / 25

Page 58: All-Pairs Shortest Paths - Floyd's Algorithm

Analysis of the Parallel Algorithm

Let α be the time to compute an iteration.Sequential execution time?

Computation time of parallel program: αn⌈np

⌉n

innermost loop executed n times

middle loop executed at most⌈np

⌉times

outer loop executed n times

Number of broadcasts: n

one per outer loop iteration

Broadcast time: dlog pe(λ+ 4n

β

)each broadcast has dlog pe steps

λ is the message latency

β is the bandwidth

each broadcast sends 4n bytes

Expected parallel execution time: αn2⌈np

⌉+ ndlog pe

(λ+ 4n

β

)

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 21 / 25

Page 59: All-Pairs Shortest Paths - Floyd's Algorithm

Analysis of the Parallel Algorithm

Let α be the time to compute an iteration.Sequential execution time: αn3

Computation time of parallel program?

αn⌈np

⌉n

innermost loop executed n times

middle loop executed at most⌈np

⌉times

outer loop executed n times

Number of broadcasts: n

one per outer loop iteration

Broadcast time: dlog pe(λ+ 4n

β

)each broadcast has dlog pe steps

λ is the message latency

β is the bandwidth

each broadcast sends 4n bytes

Expected parallel execution time: αn2⌈np

⌉+ ndlog pe

(λ+ 4n

β

)

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 21 / 25

Page 60: All-Pairs Shortest Paths - Floyd's Algorithm

Analysis of the Parallel Algorithm

Let α be the time to compute an iteration.Sequential execution time: αn3

Computation time of parallel program: αn⌈np

⌉n

innermost loop executed n times

middle loop executed at most⌈np

⌉times

outer loop executed n times

Number of broadcasts?

n

one per outer loop iteration

Broadcast time: dlog pe(λ+ 4n

β

)each broadcast has dlog pe steps

λ is the message latency

β is the bandwidth

each broadcast sends 4n bytes

Expected parallel execution time: αn2⌈np

⌉+ ndlog pe

(λ+ 4n

β

)

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 21 / 25

Page 61: All-Pairs Shortest Paths - Floyd's Algorithm

Analysis of the Parallel Algorithm

Let α be the time to compute an iteration.Sequential execution time: αn3

Computation time of parallel program: αn⌈np

⌉n

innermost loop executed n times

middle loop executed at most⌈np

⌉times

outer loop executed n times

Number of broadcasts: n

one per outer loop iteration

Broadcast time?

dlog pe(λ+ 4n

β

)each broadcast has dlog pe steps

λ is the message latency

β is the bandwidth

each broadcast sends 4n bytes

Expected parallel execution time: αn2⌈np

⌉+ ndlog pe

(λ+ 4n

β

)

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 21 / 25

Page 62: All-Pairs Shortest Paths - Floyd's Algorithm

Analysis of the Parallel Algorithm

Let α be the time to compute an iteration.Sequential execution time: αn3

Computation time of parallel program: αn⌈np

⌉n

innermost loop executed n times

middle loop executed at most⌈np

⌉times

outer loop executed n times

Number of broadcasts: n

one per outer loop iteration

Broadcast time: dlog pe(λ+ 4n

β

)each broadcast has dlog pe steps

λ is the message latency

β is the bandwidth

each broadcast sends 4n bytes

Expected parallel execution time: αn2⌈np

⌉+ ndlog pe

(λ+ 4n

β

)

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 21 / 25

Page 63: All-Pairs Shortest Paths - Floyd's Algorithm

Analysis of the Parallel Algorithm

Let α be the time to compute an iteration.Sequential execution time: αn3

Computation time of parallel program: αn⌈np

⌉n

innermost loop executed n times

middle loop executed at most⌈np

⌉times

outer loop executed n times

Number of broadcasts: n

one per outer loop iteration

Broadcast time: dlog pe(λ+ 4n

β

)each broadcast has dlog pe steps

λ is the message latency

β is the bandwidth

each broadcast sends 4n bytes

Expected parallel execution time: αn2⌈np

⌉+ ndlog pe

(λ+ 4n

β

)CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 21 / 25

Page 64: All-Pairs Shortest Paths - Floyd's Algorithm

Analysis of the Parallel Algorithm

Previous expression will overestimate parallel execution time: after the firstiteration, broadcast transmission time overlaps with computation of nextrow.

Expected parallel execution time:

αn2

⌈n

p

⌉+ ndlog peλ+ dlog pe4n

β

Experimental measurements:

α = 25, 5 ns

λ = 250 µs

β = 107 bytes/s

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 22 / 25

Page 65: All-Pairs Shortest Paths - Floyd's Algorithm

Experimental Results

Procs Ideal Predict 1 Predict 2 Actual

1 25,5 25,5 25,5 25,52 12,8 13,4 13,0 13,93 8,5 9,5 8,9 9,64 6,4 7,7 6,9 7,35 5,1 6,6 5,7 6,06 4,3 5,9 4,9 5,27 3,6 5,5 4,3 4,58 3,2 5,1 3,9 4,0

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 23 / 25

Page 66: All-Pairs Shortest Paths - Floyd's Algorithm

Review

All-Pairs Shortest Paths, Floyd’s Algorithm

Partitioning

Input / Output

Implementation and Analysis

Benchmarking

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 24 / 25

Page 67: All-Pairs Shortest Paths - Floyd's Algorithm

Next Class

Performance metrics

CPD (DEI / IST) Parallel and Distributed Computing – 14 2012-11-06 25 / 25