Network Protocols: Design and Analysis

Network Protocols: Design and Analysis

Polly Huang

EE NTU

http://cc.ee.ntu.edu.tw/~phuang

[email protected]

Multicast Routing

[Deering88b]

Polly Huang, NTU EE 3

Key ideas

• lays foundation for IP multicast– defines IP service model

• ex. best effort, packet based, anon group• compare to ISIS with explicit group membership, guaranteed

ordering (partial or total ordering)

• several algorithms– extended/bridged LANs– distance-vector extensions– link-state extensions

• cost analysis


Why Multicast

• save bandwidth

• anonymous addressing


Characterizing Groups

• pervasive or dense– most LANs have a receiver

• sparse– few LANs have receivers

• local– inside a single adminstrative domain


Service Model

• same delivery characteristics as unicast– best effort packet delivery– open-loop (no built-in congestion/flow control)

• scoping as control mechanism• groups identified by a single IP address• group membership is open

– anyone can join or leave– do security at higher levels


Routing Algorithms

• single spanning tree– for bridged LANs

• distance-vector based

• link-state based


Distance-vector Mcast Rtg

• Basic idea: flood and prune• flood: send info about new sources everywhere• prune: routers will tell us if they don’t have

receivers• routing info is soft state; periodically re-flood (and

prune) to refresh this info– if no refresh, then the info goes away => easy fault

recovery


Example Topologyg g

s

g


Phase 1: Flood using Truncated Broadcast

g g

s

g

truncated broadcast:this router knows it has no gropus on its LAN, so it doesn’t broadcast


Phase 2: Pruneg g

s

prune (s,g)

prune (s,g)

g


graft (s,g)

graft (s,g)

Phase 3: Graftg g

s

g

g

report (g)


Phase 4: Steady Stateg g

s

g

g


Sending Data in DVMRP

• Data packets are sent on all branches of the tree– send on all interfaces except the one they came in on

• RPF (Reverse Path Forwarding) Check:– drop packets that arrive on incorrect interfaces (i.e., not

from the unicast direction to the sending host)

– why? suppress errant packets


DVMPR Pros and Cons

• Pros:– simple– works well with many receivers. why? overhead is

per-sender, receivers are passive

• Cons:– works poorly with many groups (why? every sender in

every group floods the nets)– works poorly with sparse groups (why? flood data

everywhere and then prune back, expensive if only needed some places)


Link-state Multicast Routing

• Basic idea: treat group members (receivers) as new links– flood info about them to everyone in LSA msg (just lik

e LSA rtg)

• Compute next-hop for mcast routes on-demand (lazily)– unlike for LSA unicast where all are computed as soon

as LSA arrives

• realized as MOSPF


S1

R1

R2

X

Y

Z

Link state: Each router floods link state advertisementMulticast: add membership information to “link state” Each router computes multicast tree for each active source, builds forwarding entry with outgoing interface list.


S1

R1

R2

X

Y

Z has network map, including membership at X and YZ computes shortest path tree from S1 to X and Y (when it gets a data packet on G), puts in rtg tableW, Q, R, each do same thing as data arrives at them

Z

W

Q

R


R1

R2

X

Y

Z

W

Q

R

S1

Link state advertisement with new topology may requirere-computation of tree and forwarding entry(only Z and W send new LSA messages, but all on path recompute)


R1

R2

X

Y

Z

W

Q

R

S1

T

R3

Link state advertisement (T) with new membership (R3) may require incremental computation and addition of interfaceto outgoing interface list (Z)


MOSPF Pros and Cons

• Pros:– simple add on to OSFP

– works well with many senders. why? no per-sender state

• Cons:– works poorly with many receivers (why? per-receiver costs)

– works poorly with sparse groups (why? lots of info goes places that don’t want it)

– works poorly with large domains (why? link-state scales wrt number of links—many links causes frequent changes)

PIM

[Deering96a]


Key ideas

• want a mcast routing protocol that works well with sparse users

• use a single shared tree; fix one host as rendezvous point


Why not just DVMRP or MOSPF?

• With sparse groups, both are expensive– DVMRP problem with many senders– MOSPF problem with many receivers– neither works well with sparse groups

• Solution: PIM-SM– use rendezvous point as a place to meet– but dowside:

• single point of failure• don’t necessarily get shortest path• also concerned about “concentration” of all data going through

rendezvous point


New Design Questions

• Where to place RP?• How to make the RP robust?

– don’t want a single point of failure

• How to build the tree given an RP?• How to send data with a shared tree?• What is the overhead of going through RP (a

shared tree)?• How to switch from shared tree to SPT?


Where to place RP?

• RP is a node to which people send join messages

• place it in the core– at the edge is more expensive since tfc must go

through it

• optimal placement is NP-hard


Robustness

• single RP is single point of failure, so must have backup plan

• approach:– start with a set of cores– hash the group name to form an ordered list

• basic idea: order RPs, hash(G) selects one, use it• if it fails, hash(G) to find the next one• if everyone uses the same hash function, people find

the same RPs


Building the Shared Tree

• Simply send a message towards the RP– use the unicast routing table to get there

• Add links to the tree as you go

• Stop if you get to a rtr that’s already in the tree

• Gets reverse shortest path to RP


PIM Example: build Shared tree

Shared tree after R1,R2,R3 join

Join messagetoward RP

RP

R1

R2 R3

R4

(*, G)

(*, G)(*, G)

(*, G)(*, G)

(*, G)

(*, G)

(*, G)


PIM: Sending Data

• If you are on the tree, you just send it as with other mcast protocols– it follows the tree

• If you are not on the tree (say, you’re a sender but not a group member), the pkt is tunneled to the RP that sends it– why central placement of RP is important


PIM Example: sending data on the tree

RP

R1

R2 R3

R4

(*, G)

(*, G)(*, G)

(*, G)(*, G)

(*, G)

(*, G)

(*, G)

R4 sends data


Sending data if not on the tree

RP

R1

R2 R3

R4

S1 unicast encapsulated datapacket to RP in Register

RP decapsulates,forwards downshared tree

(*, G)


What is the cost of the shared tree?

• Some data goes further than it should– but latency is bounded to 2x SPT

• All data goes on one tree, rather than on many trees– but no guarantee you get multiple paths with

source-specific trees

• But to optimize things, PIM-SM supports source-specific trees


Build source-specific tree

RP

R1

R2 R3

R4

Join messagestoward S1

RP distribution treeBuild source-specific treefor high data rate source

S1

(S1, G) (*,G)

(S1, G)

(S1, G)

(S1, G), (*,G)(S1, G) (*,G)(*, G)


Forward packets on “longest-match” entry

RP

R1

R2 R3

R4

R5

S1Source (S1)-specificdistribution tree

Shared tree

Source-specific entry is“longer match” for source S1than is Shared tree entrythat can be used by any source

(S1, G) (*,G)

(S1, G)

(S1, G)

(S1, G), (*,G)(S1, G) (*,G)(*, G)

(*,G)

(*,G)


SPT and Shared Trees

• Many more details to be careful about– need to handle switchover from shared-tree to SPT

gracefully

– need to support pruning for both SPT and shared-tree

• and have to worry about LANs with multiple routers, multiple senders, etc.

• Uses similar protocols (soft-state, refresh, etc.), but lots of details


PIM-SM observations

• does a good job at intra-domain mcast routing that scales to– many senders– many receivers– many groups– large bandwidth

• preserves original (simple) service model• but quite complex• but actually implemented today


Multi-AS Mcast Routing

• Fine, PIM-SM (or DVMRP or MOSPF) work inside an AS, what about between ASes?– lots of policy questions– and have to show ISPs why they should deploy (how th

ey can make money :-)

– and convince them the world won’t end

• multicast, that’s for high-bandwidth video, right?

• multicast can flood all my links with data, right?

• what apps, again?


MSDP

• Support for inter-domain PIM-SM

• Temporary solution

• Basic approach:– send all sources to all ASes (like original flood-

and-prune)– AS border routers are PIM-SM RPs for their do

main


But does this seem complicated?

• some people thought so

• and commercial deployment has been slow

• if we change the service model, maybe we can greatly simplify things– and make it easier for ISPs to understand how t

o change/manage mcast

EXPRESS

Express

[Holbrook99a]


Key ideas

• use channels: a single sender, many subscribes– makes mcast tree easier to config– easier to tell who can send

• add mechanism to let you count subscribers

• easier to think about billing

• goal: define a simpler model


Multicast Problems

• need billing mechanism– need to know number of subscribers

• need access control– need to limit who can send and subscribe– ISPs concerned about mcast

• IPv4 mcast addresses too limited• current protocols too complex single source multicast


Express vs. Multicast Problems

• need billing mechanism– record sources– count receivers

• need access control– only subscriber can send

• IPv4 mcast addresses too limited– address on source and group


Express Approach

• all addresses are source specific (S,E)– 224 channels per source, (232 sources)

• access control– only source can send– channels optionally protected by “key” (really just a sec

ret)

• sub-cast support (encapsulate pkt to any router on the tree [if you know who they are])

• best-effort counting service


Express Components

• ECMP: Express Count Mgt Protocol– like IGMP, but also adds count support– counts used to determine receivers or for other

things like voting• not clear how general

• session relays– service at source that can relay data on to tree

(similar to PIM tunneling)


Observations

• Simpler? yes

• Enough to justify mcast to ISPS? not clear


Another Alternative: Application-level Multicast

• if the ISPs won’t give us multicast, we’ll take it :-)

• just do it all at the app• results in some duplicated data on links• and app doesn’t have direct access to unicas

t routing• but can work… (ex. Yoid project at ISI)


Application-level Multicast Example

Src Src Src


App-level Multicast

• Simplest approach:– send data to central site that forwards

• Better approaches:– try to balance load on any one link– try to topologically cluster relays

Question?

Network Protocols: Design and Analysis

Documents