Network Protocols: Design and Analysis Polly Huang EE NTU http://cc.ee.ntu.edu.tw/~phuang [email protected]
Jan 05, 2016
Network Protocols: Design and Analysis
Polly Huang
EE NTU
http://cc.ee.ntu.edu.tw/~phuang
Multicast Routing
[Deering88b]
Polly Huang, NTU EE 3
Key ideas
• lays foundation for IP multicast– defines IP service model
• ex. best effort, packet based, anon group• compare to ISIS with explicit group membership, guaranteed
ordering (partial or total ordering)
• several algorithms– extended/bridged LANs– distance-vector extensions– link-state extensions
• cost analysis
Polly Huang, NTU EE 4
Why Multicast
• save bandwidth
• anonymous addressing
Polly Huang, NTU EE 5
Characterizing Groups
• pervasive or dense– most LANs have a receiver
• sparse– few LANs have receivers
• local– inside a single adminstrative domain
Polly Huang, NTU EE 6
Service Model
• same delivery characteristics as unicast– best effort packet delivery– open-loop (no built-in congestion/flow control)
• scoping as control mechanism• groups identified by a single IP address• group membership is open
– anyone can join or leave– do security at higher levels
Polly Huang, NTU EE 7
Routing Algorithms
• single spanning tree– for bridged LANs
• distance-vector based
• link-state based
Polly Huang, NTU EE 8
Distance-vector Mcast Rtg
• Basic idea: flood and prune• flood: send info about new sources everywhere• prune: routers will tell us if they don’t have
receivers• routing info is soft state; periodically re-flood (and
prune) to refresh this info– if no refresh, then the info goes away => easy fault
recovery
Polly Huang, NTU EE 9
Example Topologyg g
s
g
Polly Huang, NTU EE 10
Phase 1: Flood using Truncated Broadcast
g g
s
g
truncated broadcast:this router knows it has no gropus on its LAN, so it doesn’t broadcast
Polly Huang, NTU EE 11
Phase 2: Pruneg g
s
prune (s,g)
prune (s,g)
g
Polly Huang, NTU EE 12
graft (s,g)
graft (s,g)
Phase 3: Graftg g
s
g
g
report (g)
Polly Huang, NTU EE 13
Phase 4: Steady Stateg g
s
g
g
Polly Huang, NTU EE 14
Sending Data in DVMRP
• Data packets are sent on all branches of the tree– send on all interfaces except the one they came in on
• RPF (Reverse Path Forwarding) Check:– drop packets that arrive on incorrect interfaces (i.e., not
from the unicast direction to the sending host)
– why? suppress errant packets
Polly Huang, NTU EE 15
DVMPR Pros and Cons
• Pros:– simple– works well with many receivers. why? overhead is
per-sender, receivers are passive
• Cons:– works poorly with many groups (why? every sender in
every group floods the nets)– works poorly with sparse groups (why? flood data
everywhere and then prune back, expensive if only needed some places)
Polly Huang, NTU EE 16
Link-state Multicast Routing
• Basic idea: treat group members (receivers) as new links– flood info about them to everyone in LSA msg (just lik
e LSA rtg)
• Compute next-hop for mcast routes on-demand (lazily)– unlike for LSA unicast where all are computed as soon
as LSA arrives
• realized as MOSPF
Polly Huang, NTU EE 17
S1
R1
R2
X
Y
Z
Link state: Each router floods link state advertisementMulticast: add membership information to “link state” Each router computes multicast tree for each active source, builds forwarding entry with outgoing interface list.
Polly Huang, NTU EE 18
S1
R1
R2
X
Y
Z has network map, including membership at X and YZ computes shortest path tree from S1 to X and Y (when it gets a data packet on G), puts in rtg tableW, Q, R, each do same thing as data arrives at them
Z
W
Q
R
Polly Huang, NTU EE 19
R1
R2
X
Y
Z
W
Q
R
S1
Link state advertisement with new topology may requirere-computation of tree and forwarding entry(only Z and W send new LSA messages, but all on path recompute)
Polly Huang, NTU EE 20
R1
R2
X
Y
Z
W
Q
R
S1
T
R3
Link state advertisement (T) with new membership (R3) may require incremental computation and addition of interfaceto outgoing interface list (Z)
Polly Huang, NTU EE 21
MOSPF Pros and Cons
• Pros:– simple add on to OSFP
– works well with many senders. why? no per-sender state
• Cons:– works poorly with many receivers (why? per-receiver costs)
– works poorly with sparse groups (why? lots of info goes places that don’t want it)
– works poorly with large domains (why? link-state scales wrt number of links—many links causes frequent changes)
PIM
[Deering96a]
Polly Huang, NTU EE 23
Key ideas
• want a mcast routing protocol that works well with sparse users
• use a single shared tree; fix one host as rendezvous point
Polly Huang, NTU EE 24
Why not just DVMRP or MOSPF?
• With sparse groups, both are expensive– DVMRP problem with many senders– MOSPF problem with many receivers– neither works well with sparse groups
• Solution: PIM-SM– use rendezvous point as a place to meet– but dowside:
• single point of failure• don’t necessarily get shortest path• also concerned about “concentration” of all data going through
rendezvous point
Polly Huang, NTU EE 25
New Design Questions
• Where to place RP?• How to make the RP robust?
– don’t want a single point of failure
• How to build the tree given an RP?• How to send data with a shared tree?• What is the overhead of going through RP (a
shared tree)?• How to switch from shared tree to SPT?
Polly Huang, NTU EE 26
Where to place RP?
• RP is a node to which people send join messages
• place it in the core– at the edge is more expensive since tfc must go
through it
• optimal placement is NP-hard
Polly Huang, NTU EE 27
Robustness
• single RP is single point of failure, so must have backup plan
• approach:– start with a set of cores– hash the group name to form an ordered list
• basic idea: order RPs, hash(G) selects one, use it• if it fails, hash(G) to find the next one• if everyone uses the same hash function, people find
the same RPs
Polly Huang, NTU EE 28
Building the Shared Tree
• Simply send a message towards the RP– use the unicast routing table to get there
• Add links to the tree as you go
• Stop if you get to a rtr that’s already in the tree
• Gets reverse shortest path to RP
Polly Huang, NTU EE 29
PIM Example: build Shared tree
Shared tree after R1,R2,R3 join
Join messagetoward RP
RP
R1
R2 R3
R4
(*, G)
(*, G)(*, G)
(*, G)(*, G)
(*, G)
(*, G)
(*, G)
Polly Huang, NTU EE 30
PIM: Sending Data
• If you are on the tree, you just send it as with other mcast protocols– it follows the tree
• If you are not on the tree (say, you’re a sender but not a group member), the pkt is tunneled to the RP that sends it– why central placement of RP is important
Polly Huang, NTU EE 31
PIM Example: sending data on the tree
RP
R1
R2 R3
R4
(*, G)
(*, G)(*, G)
(*, G)(*, G)
(*, G)
(*, G)
(*, G)
R4 sends data
Polly Huang, NTU EE 32
Sending data if not on the tree
RP
R1
R2 R3
R4
S1 unicast encapsulated datapacket to RP in Register
RP decapsulates,forwards downshared tree
(*, G)
Polly Huang, NTU EE 33
What is the cost of the shared tree?
• Some data goes further than it should– but latency is bounded to 2x SPT
• All data goes on one tree, rather than on many trees– but no guarantee you get multiple paths with
source-specific trees
• But to optimize things, PIM-SM supports source-specific trees
Polly Huang, NTU EE 34
Build source-specific tree
RP
R1
R2 R3
R4
Join messagestoward S1
RP distribution treeBuild source-specific treefor high data rate source
S1
(S1, G) (*,G)
(S1, G)
(S1, G)
(S1, G), (*,G)(S1, G) (*,G)(*, G)
Polly Huang, NTU EE 35
Forward packets on “longest-match” entry
RP
R1
R2 R3
R4
R5
S1Source (S1)-specificdistribution tree
Shared tree
Source-specific entry is“longer match” for source S1than is Shared tree entrythat can be used by any source
(S1, G) (*,G)
(S1, G)
(S1, G)
(S1, G), (*,G)(S1, G) (*,G)(*, G)
(*,G)
(*,G)
Polly Huang, NTU EE 36
SPT and Shared Trees
• Many more details to be careful about– need to handle switchover from shared-tree to SPT
gracefully
– need to support pruning for both SPT and shared-tree
• and have to worry about LANs with multiple routers, multiple senders, etc.
• Uses similar protocols (soft-state, refresh, etc.), but lots of details
Polly Huang, NTU EE 37
PIM-SM observations
• does a good job at intra-domain mcast routing that scales to– many senders– many receivers– many groups– large bandwidth
• preserves original (simple) service model• but quite complex• but actually implemented today
Polly Huang, NTU EE 38
Multi-AS Mcast Routing
• Fine, PIM-SM (or DVMRP or MOSPF) work inside an AS, what about between ASes?– lots of policy questions– and have to show ISPs why they should deploy (how th
ey can make money :-)
– and convince them the world won’t end
• multicast, that’s for high-bandwidth video, right?
• multicast can flood all my links with data, right?
• what apps, again?
Polly Huang, NTU EE 39
MSDP
• Support for inter-domain PIM-SM
• Temporary solution
• Basic approach:– send all sources to all ASes (like original flood-
and-prune)– AS border routers are PIM-SM RPs for their do
main
Polly Huang, NTU EE 40
But does this seem complicated?
• some people thought so
• and commercial deployment has been slow
• if we change the service model, maybe we can greatly simplify things– and make it easier for ISPs to understand how t
o change/manage mcast
EXPRESS
Express
[Holbrook99a]
Polly Huang, NTU EE 42
Key ideas
• use channels: a single sender, many subscribes– makes mcast tree easier to config– easier to tell who can send
• add mechanism to let you count subscribers
• easier to think about billing
• goal: define a simpler model
Polly Huang, NTU EE 43
Multicast Problems
• need billing mechanism– need to know number of subscribers
• need access control– need to limit who can send and subscribe– ISPs concerned about mcast
• IPv4 mcast addresses too limited• current protocols too complex single source multicast
Polly Huang, NTU EE 44
Express vs. Multicast Problems
• need billing mechanism– record sources– count receivers
• need access control– only subscriber can send
• IPv4 mcast addresses too limited– address on source and group
Polly Huang, NTU EE 45
Express Approach
• all addresses are source specific (S,E)– 224 channels per source, (232 sources)
• access control– only source can send– channels optionally protected by “key” (really just a sec
ret)
• sub-cast support (encapsulate pkt to any router on the tree [if you know who they are])
• best-effort counting service
Polly Huang, NTU EE 46
Express Components
• ECMP: Express Count Mgt Protocol– like IGMP, but also adds count support– counts used to determine receivers or for other
things like voting• not clear how general
• session relays– service at source that can relay data on to tree
(similar to PIM tunneling)
Polly Huang, NTU EE 47
Observations
• Simpler? yes
• Enough to justify mcast to ISPS? not clear
Polly Huang, NTU EE 48
Another Alternative: Application-level Multicast
• if the ISPs won’t give us multicast, we’ll take it :-)
• just do it all at the app• results in some duplicated data on links• and app doesn’t have direct access to unicas
t routing• but can work… (ex. Yoid project at ISI)
Polly Huang, NTU EE 49
Application-level Multicast Example
Src Src Src
Polly Huang, NTU EE 50
App-level Multicast
• Simplest approach:– send data to central site that forwards
• Better approaches:– try to balance load on any one link– try to topologically cluster relays
Question?