2006-09-29 2006-09-29 Emin Gabrielyan, Three Topi Emin Gabrielyan, Three Topi cs in Parallel Communicatio cs in Parallel Communicatio ns ns 1 Three Topics in Three Topics in Parallel Parallel Communications Communications Thesis presentation by Thesis presentation by Emin Gabrielyan Emin Gabrielyan
76
Embed
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications 1 Three Topics in Parallel Communications Thesis presentation by Emin Gabrielyan.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
2006-09-292006-09-29 Emin Gabrielyan, Three Topics in ParallEmin Gabrielyan, Three Topics in Parallel Communicationsel Communications
11
Three Topics in Parallel Three Topics in Parallel CommunicationsCommunications
Thesis presentation by Emin Thesis presentation by Emin GabrielyanGabrielyan
222006-09-292006-09-29 Emin Gabrielyan, Three Topics in ParallEmin Gabrielyan, Three Topics in Parallel Communicationsel Communications
Parallel communications: Parallel communications: bandwidth enhancement or fault-bandwidth enhancement or fault-
tolerance?tolerance?
We do not know if parallel communications We do not know if parallel communications were first used for fault-tolerance or for were first used for fault-tolerance or for bandwidth enhancementbandwidth enhancement
In 1964 Paul Baran proposed parallel In 1964 Paul Baran proposed parallel communications for fault-tolerance communications for fault-tolerance (inspiring the design of ARPANT and Internet)(inspiring the design of ARPANT and Internet)
1981 IBM invented the 8-bit parallel port 1981 IBM invented the 8-bit parallel port for faster communicationfor faster communication
332006-09-292006-09-29 Emin Gabrielyan, Three Topics in ParallEmin Gabrielyan, Three Topics in Parallel Communicationsel Communications
Bandwidth enhancement by Bandwidth enhancement by parallelizing the sources and sinksparallelizing the sources and sinks
Bandwidth enhancement Bandwidth enhancement can be achieved by can be achieved by adding parallel pathsadding parallel pathsBut a greater capacity But a greater capacity enhancement is enhancement is achieved if we can achieved if we can replace the senders and replace the senders and destinations with parallel destinations with parallel sources and sinkssources and sinksThis is possible in This is possible in parallel I/O (first topic of parallel I/O (first topic of the thesis)the thesis)
442006-09-292006-09-29 Emin Gabrielyan, Three Topics in ParallEmin Gabrielyan, Three Topics in Parallel Communicationsel Communications
Parallel transmissions in coarse-Parallel transmissions in coarse-grained networks cause congestionsgrained networks cause congestions
In coarse-grained circuit-switched HPC In coarse-grained circuit-switched HPC networks uncoordinated parallel networks uncoordinated parallel transmissions cause congestionstransmissions cause congestions
The overall throughput degrades due to The overall throughput degrades due to access conflicts on shared resourcesaccess conflicts on shared resources
Coordination of parallel transmissions is Coordination of parallel transmissions is covered by the second topic of my thesis covered by the second topic of my thesis (liquid scheduling)(liquid scheduling)
552006-09-292006-09-29 Emin Gabrielyan, Three Topics in ParallEmin Gabrielyan, Three Topics in Parallel Communicationsel Communications
Classical backup parallel circuits for Classical backup parallel circuits for fault-tolerancefault-tolerance
Typically the Typically the redundant redundant resource remains resource remains idleidle
As soon as there is As soon as there is a failure with the a failure with the primary resourceprimary resource
The backup The backup resource replaces resource replaces the primary onethe primary one
662006-09-292006-09-29 Emin Gabrielyan, Three Topics in ParallEmin Gabrielyan, Three Topics in Parallel Communicationsel Communications
Parallelism in living organismsParallelism in living organismsParallelism is Parallelism is observed in observed in almost every almost every living organismsliving organismsDuplication of Duplication of organs primarily organs primarily serves for fault-serves for fault-tolerancetoleranceAnd as a And as a secondary secondary purpose, for purpose, for capacity capacity enhancementenhancement
772006-09-292006-09-29 Emin Gabrielyan, Three Topics in ParallEmin Gabrielyan, Three Topics in Parallel Communicationsel Communications
Simultaneous parallelism for fault-Simultaneous parallelism for fault-tolerance in fine-grained networkstolerance in fine-grained networks
A challenging bio-A challenging bio-inspired solution is inspired solution is to use to use simultaneously all simultaneously all available paths for available paths for achieving fault-achieving fault-tolerancetoleranceThis topic is This topic is addressed in the addressed in the last part of my last part of my presentation presentation (capillary routing)(capillary routing)
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
8
Fine Granularity Parallel I/O for Cluster
Computers
SFIO, a Striped File parallel I/O
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
9
Why is parallel I/O required
Single I/O gateway for cluster computer saturates
Does not scale with the size of the cluster
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
10
What is Parallel I/O for Cluster Computers
Some or all of the cluster computers can be used for parallel I/O
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
11
Objectives of parallel I/O
Resistance to concurrent access Scalability as the number of I/O nodes
increases High level of parallelism and load balance for
all application patterns and all types of I/O requests
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
12
Parallel I/O Subsystem
Concurrent Access by Multiple Compute Nodes
No concurrent access overheads
No performsne degradation
When the number of compute nodes increases
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
13
Scalable throughput of the parallel I/O subsystem
The overall parallel I/O throughput should increase linearly as the number of I/O nodes increasesParallel I/O Subsystem
Number of I/O Nodes
Thr
ough
put
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
14
Concurrency and Scalability = Scalable All-to-All Communication
Concurrency and Scalability (as the number of I/O nodes increases) can be represented by scalable overall throughput when the number of compute and I/O nodes increases
Number of I/O and Compute Nodes
All-
to-A
ll T
hrou
ghpu
t
I/O Nodes
Compute Nodes
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
15
High level of parallelism and load balance
Balanced distribution across parallel disks must be ensured:
For all types of application patterns: Using small or large I/O requests Continuous or fragmented I/O request
patterns
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
16
How parallelism is achieved?
Split the logical file into stripes
Distribute the stripes cyclically across the subfiles
Sub
files
file1
file2 file3
file4
file5file6
Logical file
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
17
The POSIX-like Interface of Striped File I/O
Using SFIO from MPI
Simple Posix like interface
#include <mpi.h>#include "/usr/local/sfio/mio.h"int _main(int argc, char *argv[]){ MFILE *f; int r=rank(); //Collective open operation f=mopen("p1/tmp/a.dat;p2/tmp/a.dat;", 5); //each process writes 8 to 14 characters at its own position
Proved. If we remove a team from a traffic, new bottlenecks can emerge
New bottlenecks add additional constraints on the teams of the reduced traffic
Proved. A liquid schedule can be assembled if we use teams of the reduced traffic (instead of constructing teams of the initial traffic from the remaining transfers)
Proved. A liquid schedule can be assembled by considering only saturated full teams
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
50
Liquid schedule construction speed with our algorithm
0.001
0.01
0.1
1
10
100
1000
10000
100000
1 21 41 61 81 101
121
141
161
181
201
221
241
261
281
301
321
341
361
362 sample topologies
CP
U ti
me
in s
econ
ds -
MILP Cplex method Liquid schedule construction algorithm
360 traffic patterns across Swiss-Tx network
Up to 32 nodes Up to 1024 transfers Comparison of our
optimized construction algorithm with MILP method (optimized for discrete optimization problems)
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
51
Carrying real traffic patterns according to liquid schedules
Swiss-Tx supercomputer cluster network is used for testing aggregate throughputs
Traffic patterns are carried out according liquid schedules
Compare with topology-unaware round robin or random schedules
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
52
Theoretical liquid and round-robin throughputs of 362 traffic samples
362 traffic samples across Swiss-Tx network
Up to 32 nodes Traffic carried out
according to round robin schedule reaches only 1/2 of the potential network capacity
0
200
400
600
800
1000
1200
1400
1600
1800
0 (
00)
64 (
08)
100
(10
)12
1 (
11)
144
(12
)16
9 (
13)
196
(14
)22
5 (
15)
225
(15
)25
6 (
16)
289
(17
)32
4 (
18)
361
(19
)40
0 (
20)
441
(21
)48
4 (
22)
576
(24
)62
5 (
25)
900
(30
)
Ove
rall
thro
ughp
ut (
MB
/s)
-
liquid throughput round-robin schedule
nodes:
transfers:
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
53
Throughput of traffic carried out according liquid schedules
Traffic carried out according to liquid schedule practically reaches the theoretical throughput
200
400
600
800
1000
1200
1400
1600
1800
1 (
01)
64 (
08)
100
(10
)
121
(11
)
144
(12
)
169
(13
)
196
(14
)
225
(15
)
225
(15
)
256
(16
)
289
(17
)
324
(18
)
361
(19
)
400
(20
)
441
(21
)
484
(22
)
576
(24
)
676
(26
)
961
(31
)
Ove
rall
tthr
ough
put (
MB
/s)
theoretical liquid throughputmeasured throughput of a topology-unaware schedulemeasured throughput of a liquid schedule
nodes:
transfers:
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
In HPC networks, large messages are “copied” across the network causing congestions
Arbitrarily transmitted transfers yield throughput below the theoretical capacity
Liquid scheduling: relies on network topology and reaches the theoretical liquid throughput of the network
Liquid schedules can be constructed in less than 0.1 sec for traffic patterns with 1000 transmissions (about 100 nodes)
Future work: dynamic traffic patterns and application in OBS
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
55
Fault-tolerant streaming with Capillary-routing
Path diversity and Forward Error Correction codes at the packet level
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
56
Structure of my talk The advantages of packet level FEC in
Off-line streaming Solving the difficulties of Real-time
streaming by multi-path routing Generating multi-path routing
patterns of various path diversity Level of the path diversity and the
efficiency of the routing pattern for real-time streaming
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
57
Decoding a file with Digital Fountain Codes
A file is divided into packets
Digital fountain code generates numerous checksum packets
Sufficient quantity of any checksum packets recovers the file
Like when filling your cup only collecting a sufficient amount of drops matters
…
…
…
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
58
Transmitting large files without feedback across lossy networks using digital fountain codes
Sender transmits the checksum packets instead of the source packets
Interruptions cause no problems
The file is recovered once a sufficient number of packets is delivered
FEC in off-line streaming relies on time stretching
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
59
In Real-time streaming the receiver play-back buffering time is limited
While in off-line streaming the data can be hold in the receiver buffer …
In real-time streaming the receiver is not permitted to keep data too long in the playback buffer
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
60
Long failures on a single path route
If the failures are short, by transmitting a large number of FEC packets, receiver may constantly have in time a sufficient number of checksum packets
If the failure lasts longer than the playback buffering limit, no FEC can protect the real-time communication
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
61
Reliable Off-line streaming
Rel
iabl
e re
al-
Tim
e st
ream
ing
Applicability of FEC in Real-Time streaming by using path diversity
Time stretching
Pla
ybac
k b
uffe
r lim
it
Real-time streaming
Losses can be recovered by extra packets:
received later (in off-line streaming)
received via another path (in real-time streaming)
Path diversity replaces time-stretching
Pat
h di
vers
ity
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
62
Creating an axis of multi-path patterns
Intuitively we imagine the path diversity axis as shown
High diversity decreases the impact of individual link failures, but uses much more links, increasing the overall failure probability
We must study many multi-path routings patterns of different diversity in order to answer this question
Single path routing
Multi-path routing
Multi-path routing
Multi-path routing
Path diversity
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
63
Capillary routing creates solutions with different level of path diversity
As a method for obtaining multi-path routing patterns of various path diversity we relay on capillary routing algorithm
For any given network and pair of nodes capillary routing produces layer by layer routing patterns of increasing path diversity
Path diversity = Layer of Capillary Routing
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
64
Capillary routing - introduction
Capillary routing first offers a simple multi-path routing pattern
At each successive layer it recursively spreads out individual sub-flows of previous layers
The path diversity develops as the layer number increases
The construction relies on LP
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
65
Reduce the maximal load of all links
Capillary routing – first layer First take the
shortest path flow and minimize the maximal load of all links
This will split the flow over a few parallel routes
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
66
Capillary routing – second layer Then identify the
bottleneck links of the first layer
And minimize the flow of the remaining links
Continue similarly, until the full routing pattern is discovered layer by layer
Reduce the load of the remaining
links
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
67
Capillary Routing Layers
Single network
4 routing patterns
Increasing path diversity
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
68
Application model: evaluating the efficiency of path diversity To evaluate the efficiencies of patterns
with different path diversities we rely on an application model where:
The sender uses a constant amount of FEC checksum packets to combat weak losses and
The sender dynamically increases the number of FEC packets in case of serious failures
source packets re
dund
ant
pack
ets
FEC block
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
69
Packet Loss Rate = 3%
Packet Loss Rate = 30%
Strong FEC codes are used in case of serious failures
When the packet loss rate observed at the receiver is below the tolerable limit, the sender transmits at its usual rate
But when the packet loss rate exceeds the tolerable limit, the sender adaptively increases the FEC block size by adding more redundant packets
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
70
Redundancy Overall Requirement The overall amount of dynamically
transmitted redundant packets during the whole communication time is proportional:
to the duration of communication and the usual transmission rate
to a single link failure frequency and its average duration
and to a coefficient characterizing the given multi-path routing pattern
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
71
Equation for ROR: it depends only on the routing pattern r(l)
Where: FECr(l) is the FEC transmission block size in case of the complete failure of link l
r(l) is the load of link l for a given routing pattern FECt is the FEC block size at default
streaming (tolerating loss rate t)
1)(|
)( 1lrtLl t
lr
FEC
FECROR
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
72
ROR coefficient Smaller the ROR coefficient of the multi-
path routing pattern, better is the choice of multi-path routing for real-time streaming
By measuring ROR coefficient of multi-path routing patterns of different path diversity, we can evaluate the advantages (or disadvantages) of diversification
Multi-path routing patterns of different diversity are created by capillary routing algorithm
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications
73
05
1015202530354045505560
laye
r1
laye
r2
laye
r3
laye
r4
laye
r5
laye
r6
laye
r7
laye
r8
laye
r9
laye
r10
capillarization
Ave
rage
RO
R r
atin
g
ROR as a function of diversity Here is ROR as a
function of the capillarization level
It is an average function over 25 different network samples (obtained from MANET)
The constant tolerance of the streaming is 5.1%
Here is ROR function for a stream with a static tolerance of 4.5%
Here are ROR functions for static tolerances from 3.3% to 7.5%
3.3%3.9%4.5%5.1%
7.5%6.3%
2006-09-29 Emin Gabrielyan, Three Topics in Parallel Communications