Prof. Younghee Lee 1 Computer networks Lecture 12: Overlay network Prof. Younghee Lee Some part of this teaching materials are prepared referencing the lecture note made by Prof. Ion Stoica ([email protected] )
Prof. Younghee Lee 1
Computer networks Lecture 12: Overlay network
Prof. Younghee Lee
Some part of this teaching materials are prepared referencing the lecture note made by Prof. Ion Stoica ([email protected])
Prof. Younghee Lee 2
Active Network: Introduction
Definition– Active network is about programming the infrastructure (network) for supporting
customized communications (and computations) Active network model
– Packets can change the behavior of the switches “on-the fly”» In-band active network» Out-of-band active extensions
Motivation– Faster response to problems and possibilities in network– Accelerates network evolution– Examples
» Web proxy caching» Auctions» Reliable multicast» Congestion control
Prof. Younghee Lee 3
Architecture Active node Architecture
– User Programs – Execution Environments (EEs)– Node Operating System (NodeOS)– Security Architecture
Application Application
EE1
Application
EE2
NodeOS
Transmission Facilities
Application Application
EE1
Application
EE2
NodeOS
Prof. Younghee Lee 4
Active Networks vs. Overlay Networks
Key difference:– Active nodes operate at the network layer; overlay
nodes operate at the application layer Active Networks advantages:
– Efficiency: no need to tunnel packets; no need to process packets at layers > than network layer
Overlay Network advantages:– Easier to deploy: no need to integrate overlay nodes in
the network infrastructure» Active nodes have to collaborated (be trusted) by the other routers in
the same AS (they need to exchange routing info)
Prof. Younghee Lee 5
Conclusions
Active networks – a revolutionary paradigm– explores a significant region of the networking
architecture design space
But, is the network layer the right level to deploy it?– Maybe, but only if all (congested) routers are active…– Otherwise, overlays might be good enough…
Prof. Younghee Lee 6
Overlay network An isolated virtual network deployed over an existing
network Composed of
– Hosts, Routers– Tunnels
» Paths in the base network
» links in the overlay network
Prof. Younghee Lee 7
Benefits
Do not have to deploy new equipment, or modify existing software/protocols
Do not have to deploy at every node– not every node needs/wants overlay network service all
the time» e.g., QoS guarantees for best-effort traffic
– overlay network may be too heavyweight for some nodes
» e.g., consumes too much memory, cycles, or bandwidth
– overlay network may have unclear security properties» e.g., may be used for service denial attack
– overlay network may not scale (not exactly a benefit)» e.g. may require n2 state or communication
Prof. Younghee Lee 8
Applications of overlay network
Applications of overlay network– Mobility
» MIPv4: pretends mobile host is in home network
– Routing– Quality of Service– Addressing– Security– Multicast
Prof. Younghee Lee 9
Unicasting vs. IP Multicasting vs. ALM
• Unicasting:
• IP Multicasting:
• Application-Level Multicasting (ALM):
Replication
at routers
4 copies
Replicationat sender
Replication at end hosts
Prof. Younghee Lee 10
Application Layer Multicast
End host multicast Tree
– Overcast, Yoid, Jungle Monkey, ALMI– NICE, CoopNet, SpreadIt, ZIGZAG
Mesh– Narada, Scattercast
Multi-tree– SplitStream
Prof. Younghee Lee 12
Mesh-first Approach Members are connected to form a richer connected
graph, termed a mesh Members exchange information on the mesh Construct shortest path spanning trees of the mesh with
routing protocols e.g. DVMRP Mesh-first Approach Components
– Initial join» Learns other members’ locations
– Mesh formation» Partition avoidance
– Mesh maintenance» Adaptive to network dynamics» Improve the mesh quality
– Multicast tree formation» Constructs per-source spanning tree with routing protocol
Prof. Younghee Lee 13
Mesh-first Examples
Narada– Creates a mesh and then build multicast trees with
DVMRP algorithm.
Scattercast– Proxy servers are placed at strategic location. These
proxy servers self-organize into multicast trees.– Uses a mesh like Narada, but differences in protocol
details
Prof. Younghee Lee 14
Tree-first Approach
Constructs a multicast tree directly. Members explicitly select their parents. Single multicast tree constructed. Tree-first Approach Components
– Initial join» Learn other members’ locations
– Multicast tree formation» Loop avoidance and partition avoidance
– Multicast tree maintenance» Adaptive to network dynamics
Prof. Younghee Lee 15
Tree-first Approach Examples Overcast
– Build a single source multicast tree that maximize the bandwidth from the source to the receivers
Yoid– A tree is constructed for data delivery, while a mesh is constructed
for control messages exchanging.– Uses a shared tree among participating members– Distributed heuristics for managing and optimizing tree constructions
Jungle Monkey– Build a single source multicast tree for file transferring
ALMI– Build a single source multicast tree in single server and then
distributes it.
Prof. Younghee Lee 16
Performance Metrics
Application perspectives– Bandwidth and latency, Startup time, End-host
resource usages Network perspectives
– Resource usages– Stresses of physical links– Protocol overhead
Adaptiveness to network dynamics Failure Tolerance Scalability
Prof. Younghee Lee 17
Resource Usages
Bandwidth Sum of the costs (e.g. delay) of the overlay links
A
3
1 2
B
4
25
271
1
1
1
2 2
Resource Usages: 2 + 27 + 2 = 31
Prof. Younghee Lee 18
Stresses of Physical Links
Number of identical copies of a packet traverse a physical link
Stress of physical link 1-A is 2
A
3
B
4
21
Prof. Younghee Lee 19
Adaptiveness to Network Dynamics Discover
– Duration from nodes/links failure to detection of link degradation. React
– Duration from detection of link degradation to the first change of multicast tree
Repair– Duration from the first change of multicast tree to the change which fully
recover the multicast tree quality
Discover Time React Time Repair Time
DetectedFirst attempt Last attempt
Prof. Younghee Lee 20
Narada
Narada [Yang-hua et al, 2000]– Multi-source multicast– Involves only end hosts– Small group sizes <= hundreds of nodes– Typical application: chat
Prof. Younghee Lee 21
Narada: End System Multicast
Stanford
CMU
Stan1
Stan2
Berk2
Overlay TreeGatech
Berk1
Berkeley
GatechStan1
Stan2
CMU
Berk1
Berk2
Prof. Younghee Lee 22
Overlay Tree
The delay between the source and receivers is small Ideally,
– The number of redundant packets on any physical link is low Heuristic:
– Every member in the tree has a small degree – Degree chosen to reflect bandwidth of connection to Internet
Gatech
“Efficient” overlay
CMU
Berk2
Stan1
Stan2
Berk1Berk1
High degree (unicast)
Berk2
Gatech
Stan2CMU
Stan1
Stan2
High latency
CMU
Berk2
Gatech
Stan1
Berk1
Prof. Younghee Lee 23
Overlay Construction Problems
Dynamic changes in group membership – Members may join and leave dynamically– Members may die
Dynamic changes in network conditions and topology– Delay between members may vary over time due to
congestion, routing changes
Knowledge of network conditions is member specific– Each member must determine network conditions for itself
Prof. Younghee Lee 24
Solution Two step design
– Build a mesh that includes all participating end-hosts» what they call a mesh is just a graph» members probe each other to learn network related information » overlay must self-improve as more information available
– Build source routed distribution trees Source routed minimum spanning tree on mesh Desired properties:
– Members have low degree– Small delays from source to receivers
Berk2 GatechBerk1
Stan1Stan2
Berk2 Berk1
CMU
Gatech
Stan1Stan2
Prof. Younghee Lee 25
Narada Components/Techniques
Mesh Management: – Ensures mesh remains connected in face of membership
changes
Mesh Optimization:– Distributed heuristics for ensuring shortest path delay
between members along the mesh is small
Tree construction:– Routing algorithms for constructing data-delivery trees – Distance vector routing, and reverse path forwarding
Prof. Younghee Lee 26
Optimizing Mesh Quality
Members periodically probe other members at random New link added if
Utility_Gain of adding link > Add_Threshold
Members periodically monitor existing links Existing link dropped if
Cost of dropping link < Drop Threshold
Berk1
Stan2CMU
Gatech1
Stan1
Gatech2
A poor overlay topology:Long path from Gatech2 to CMU
Prof. Younghee Lee 27
Definitions
Utility gain of adding a link based on– The number of members to which routing delay improves
– How significant the improvement in delay to each member is Cost of dropping a link based on
– The number of members to which routing delay increases, for either neighbor
Add/Drop Thresholds are functions of:– Member’s estimation of group size
– Current and maximum degree of member in the mesh
Prof. Younghee Lee 28
Desirable properties of heuristics
Stability: A dropped link will not be immediately re-added Partition avoidance: A partition of the mesh is unlikely to be
caused as a result of any single link being dropped
Delay improves to Stan1, CMU
but marginally.
Do not add link!
Delay improves to CMU, Gatech1
and significantly.
Add link!
Berk1
Stan2CMU
Gatech1
Stan1
Gatech2
Probe
Berk1
Stan2CMU
Gatech1
Stan1
Gatech2Probe
Prof. Younghee Lee 29
Example
Used by Berk1 to reach only Gatech2 and vice versa: Drop!!
Gatech1Berk1
Stan2CMU
Stan1
Gatech2
Gatech1Berk1
Stan2CMU
Stan1
Gatech2
Prof. Younghee Lee 30
Overcast: Design Goals
Cisco Single source tree Uses an infrastructure; end hosts are not part of multicast
tree Large groups ~ millions of nodes Typical application: content distribution Provide application-level multicasting using already
existing technology via overcasting Scalable, efficient, and reliable distribution of high quality
video Compete well against IP Multicasting
Prof. Younghee Lee 31
Overcast Overcast: aimed to large groups and high throughput
applications– Examples: video streaming, software download
Deployed as an infrastructure Might work out well commercially. High definition video over the net
– CNN,Hollywood studios,Olympics Overcast could be a premium service for ISP
subscribers [like newsgroups,www hosting].
Prof. Younghee Lee 32
Tree Building Protocol
Idea: Add a new node as far away from the root as possible without compromising the throughput!
10.5
0.80.8 1
0.5
0.7
1
RootJoin (new, root) { current = root; do { B = bandwidth(new, current); B1 = 0; for all n in children(current) { B1 = bandwidth(new, n); if (B1 >= B) { current = n; break; } } while (B1 >= B); new->parent = root;}
Prof. Younghee Lee 33
Details A node periodically reevaluates its position by
measuring bandwidth to its– Siblings– Parent– Grandparent
The Up/Down protocol: track membership– Each node maintains info about all nodes in its sub-tree
plus a log of changes» Memory cheap
– Each node sends periodical alive messages to its parent– A node propagates info up-stream, when
» Hears first time from a children» If it doesn’t hear from a children for a present interval» Receives updates from children
Prof. Younghee Lee 34
Details Problem: root single point of failure Solution: replicate root to have a backup source Problem: only root maintain complete info about the tree; need also
protocol to replicate this info Elegant solution: maintain a tree in which first levels have degree one
– Advantage: all nodes at these levels maintain full info about the tree– Disadvantage: may increase delay, but this is not important for application
supported by Overcast Some Results
– Network load < twice the load of IP multicast (600 node network)– Convergence: a 600 node network converges in ~ 45 rounds
Nodes maintaining fullStatus info about tree
Prof. Younghee Lee 35
CoolStreaming/DONet
DONet – Data-driven Overlay Network
CoolStreaming – Cooperative Overlay Streaming
First release (CoolStreaming v0.9) May 2004
Till March 2005 Downloads: >100,000 Average online users: 6,000 Peak-time online users: 14,000 Google entries (CoolStreaming): 5130
Prof. Younghee Lee 36
Motivation
Enable large-scale live broadcasting in the Internet environment– Capacity limitation
» Streaming: 500Kbps, server outbound band: 100Mbps
» 200 concurrent users only
– Network heterogeneity– No QoS guarantee
Prof. Younghee Lee 37
Data-driven Overlay (DONet)
Core operations– Every node periodically exchanges data availability
information with a set of partners
– Then retrieves unavailable data from one or more partners, or supplies available data to partners
Prof. Younghee Lee 38
Features of DONet
Easy to implement – no need to construct and maintain a
complex global structure
Efficient – data forwarding is dynamically
determined according to data availability, not restricted by specific directions
Robust and resilient– adaptive and quick switching among
multi-suppliers
Prof. Younghee Lee 39
Key Modules
A generic system diagram for a DONet node.– Membership manager
» mCache – partial overlay view 5-tuple <seq num, id, num partner, time to
live, last update time>
» Update by gossip
– Partnership manager» Random selection» Partner refinement
– Transmission Scheduler
Prof. Younghee Lee 40
Transmission Scheduling Buffer Map(BM)
– Video stream is divided into segments of uniform length– Availability of the segments in the buffer of a node can be
represented by a BM– Each node continuously exchange its BM with the partners– And then schedules which segment is to be fetched from which
partner accordingly.
Problem: From which partner to fetch which data segment ? Constraints
– Data availability– Playback deadline– Heterogeneous partner bandwidth
Prof. Younghee Lee 41
Scheduling algorithm
Variation of Parallel machine scheduling– NP-hard
Heuristic– Message exchanged
» Window-based buffer map (BM): Data availability» Segment request (piggyback by BM)
– Less suppliers first: Segment with less potential suppliers is more difficult to meet the deadline constraints
– Multi-supplier: Highest bandwidth within deadline first Simpler algorithm in current implementation Network coding ?
Prof. Younghee Lee 43
Implementation: CoolStreaming First release: May 30, 2004 Source code: 2000-line Python Programming time:
– PlanetLab prototype: 2 weeks – Export from prototype: 2 weeks
Support formats: – Real Video/Windows Media– Platform/media independent
Scale and capacity– Total downloads: – Peak time: 14000 concurrent users– Streaming rate: 450-700kbps
Prof. Younghee Lee 44
Observations Current Internet has enough available band
to support TV-quality streaming (>450Kbps)– Bottleneck: server, end-to-end bandwidth
Larger data-driven overlay
better streaming quality– Capacity amplification
Prof. Younghee Lee 45
Future of DONet/Coolstreaming
Content– Solution: DONet/Coolstreaming as a capacity amplifier
between content provider and clients– Virtually part of network infrastructure
Enhancement– Scheduling algorithm
» Simplified version
» Network coding
– Transport protocol» New TCP or TFRC