1 Content Delivery Networks Web caching. 2 Replica Placement Permanent replicas (mirroring) Server-initiated replicas (push caching) Client-initiated.
Post on 11-Jan-2016
221 Views
Preview:
Transcript
1
Content Delivery NetworksWeb caching
2
Replica Placement
• Permanent replicas (mirroring)• Server-initiated replicas (push caching)• Client-initiated replicas (pull/client caching)
3
Web Caching
• Example of the web to illustrate caching and replication issues– Simpler model: clients are read-only, only server updates data
browser Web Proxycache
request
response
request
response
Web server
browserWeb server
request
response
4
Consistency Issues
• Web pages tend to be updated over time– Some objects are static, others are dynamic
– Different update frequencies (few minutes to few weeks)
• How can a proxy cache maintain consistency of cached data?– Send invalidate or update
– Push versus pull
5
Push-based Approach
• Server tracks all proxies that have requested objects• If a web page is modified, notify each proxy• Notification types
– Indicate object has changed [invalidate]
– Send new version of object [update]
• How to decide between invalidate and updates?– Pros and cons?
– One approach: send updates for more frequent objects, invalidate for rest
proxyWeb server
push
6
Push-based Approaches
• Advantages– Provide tight consistency [minimal stale data]
– Proxies can be passive
• Disadvantages– Need to maintain state at the server
• Recall that HTTP is stateless
• Need mechanisms beyond HTTP
– State may need to be maintained indefinitely
• Not resilient to server crashes
7
Pull-based Approaches
• Proxy is entirely responsible for maintaining consistency• Proxy periodically polls the server to see if object has
changed – Use if-modified-since HTTP messages
• Key question: when should a proxy poll?– Server-assigned Time-to-Live (TTL) values
• No guarantee if the object will change in the interim
proxyWeb server
poll
response
8
Pull-based Approach: Intelligent Polling
• Proxy can dynamically determine the refresh interval– Compute based on past observations
• Start with a conservative refresh interval
• Increase interval if object has not changed between two successive polls
• Decrease interval if object is updated between two polls
• Adaptive: No prior knowledge of object characteristics needed
9
Pull-based Approach
• Advantages– Implementation using HTTP (If-modified-Since)
– Server remains stateless
– Resilient to both server and proxy failures
• Disadvantages– Weaker consistency guarantees (objects can change between
two polls and proxy will contain stale data until next poll)
• Strong consistency only if poll before every HTTP response
– More sophisticated proxies required
– High message overhead
10
A Hybrid Approach: Leases
• Lease: duration of time for which server agrees to notify proxy of modification
• Issue lease on first request, send notification until expiry– Need to renew lease upon expiry
• Smooth tradeoff between state and messages exchanged– Zero duration => polling, Infinite leases => server-push
• Efficiency depends on the lease duration
Client Proxy
Server
Get + lease req
Reply + leaseread
Invalidate/update
11
Policies for Leases Duration
• Age-based lease – Based on bi-modal nature of object lifetimes– Larger the expected lifetime longer the lease
• Renewal-frequency based– Based on skewed popularity– Proxy at which objects is popular gets longer lease
• Server load based– Based on adaptively controlling the state space – Shorter leases during heavy load
12
Cooperative Caching
• Caching infrastructure can have multiple web proxies– Proxies can be arranged in a hierarchy or other structures
• Overlay network of proxies: content distribution network
– Proxies can cooperate with one another
• Answer client requests
• Propagate server notifications
13
Hierarchical Proxy Caching
Examples: Squid, Harvest
Server
Parent
HTTP
HTTP Read A1
ICPICP
ICP
2
HTTP
3
Clients
Leaf Caches
14
Locating and Accessing Data
• Lookup is local• Hit at most 2 hops• Miss at most 2 hops (1 extra on wrong hint)
Properties
(A,X)
Node X
Server for B
Clients
CachesRead A
Get A
Read B
Get BNode Y
Minimize cache hops on hit Do not slow down misses
Node Z
15
CDN Issues (Content Delivery Network )
• Which proxy answers a client request?– Ideally the “closest” proxy
– Akamai uses a DNS-based approach
• Propagating notifications– Can use multicast or application level multicast to reduce
overheads (in push-based approaches)
• Active area of research– Numerous research papers available
Case Study: Akamai
17
Basic conceptsServing Web content from a single location
can present serious problems:•Scalability•Reliability
•Performance
How to serve requests from a variable number of surrogate origin servers … at the network’s edge ?
•Flash crowds ?•Seasonal traffic spikes ?
•Over-provisioning?•Capacity planning?
Caching at the edge as a shock absorber
18
Content Distribution Bottlenecks• Congestion delay• Local outage lost data• 1st mile problem:
– Connection bet. content/service provider & ISP
• Backbone & router/switch load & failures• Peering bet. ISPs• Last mile problem:
– Connection to users’ access points
19
Why bother with a company?
20
21
Content Distribution Networks• Content Provider != Content Distributor
– CDNs “promise” improved response times & reliability, including handling of “hot spots”
• Without tremendous infrastructure & personnel investments by the content providers
• Components of a CDN:– Distributed server load balancing– DNS redirection, hashing & fault tolerance– Distributed system monitoring– Distributed software configuration management– Live stream distribution & entry points– Log collection, reporting & performance monitoring– Client provisioning mechanism– Content management & replication
22
Hosted e-business architecture
23
3-tier CDN architecture
24
Market Size of Internet CDN• Assuming 250M Web users
– With average B/W consumption 10 Kbits/sec• 10% of users online at any given time
• … the total B/W requirement is 250 Gbits/sec
05
1015202530354045
20 100 300 1000
Number of Web Sites
Per
cen
t o
f In
tern
et T
raff
ic
25
Akamai Services• EdgeSuite & FreeFlow (core products):
– GIF & static HTML delivery, streaming delivery, reporting
• StorageFlow: – replicated hosting of large files
• Digital Parcel Service: – digital rights mgmt for large files
• FirstPoint: global server load balancing– Up-to-date map of the best routes– Mirror fail-over
15,000+ servers in over 1,200 networks in 60+ countries
26
State diagram
Content Provider
Site
Nearby Ghost
“Top Level”DNS
“Low Level”DNS
User
DNS Server
1.
4.
3.
2.5.
6.
27
Serving user requests
28
“Akamaized” content (I)
“Akamaized”
HTMLDelivered byCNN
“Akamaized”
“Akamaized”
“Akamaized”
“Akamaized”“Akamaized”
“Akamaized”
Entire WebPage deliveredby CNN
29
“Akamaized” content (II)Akamaized URL:
http://a8.g.akamai.net/f/8/1162/1h/
images.cnn.com/ads/advertiser/fidelity/0104/160X60Fidelity.gif
3-stage DNS name for GSLBPage consistency policy
Embedded original hostname
“Akamaizer” filter/plug-in•for IIS, Apache, …
30
“Akamaized” content (III)- Define the default metadata for the domain(s) that you
want to serve, using the distributed architecture. This default meta-data
can be overridden at a per-object level using host-response headers or URL-prepending meta-data.
- For dynamically generated content, markup the content for assembly at the edge by inserting the appropriate ESI
tags in your templates or in your HTML or in your JSP/ASP pages.
- Finally, integrate the content generation environment with the content
assembly environment through a 3-step hand-off of DNS:1) Turn recursion off on the authoritative name server(s).
2) Set up a private hostname for Akamai to poll.3) CNAME the live hostname to Akamai.
31
“Akamaized” content (IV)When an end-user of the Web application visits www.yoursite.com,
the user’s local name server gets directed to Akamai’s DNS.
The Akamai DNS then resolves the reference to the optimally located edge server.
The edge server assembles the relevant content based on the rules established in the meta-data instructions that are stored locally.
- Static content is typically retrieved from cache to be served to the browser.
- Dynamic content is either assembled from page fragments stored in the cache, or retrieved from the origin server. The page
assembly engine then assembles these fragments to be tailored & personalized to the user’s geographic location, cookie, device or
other chosen mechanisms.
32
“Akamaized” content (V)• http://www.mydomain.com/frontpage.jpg
http://xxxx.yy.zzzz.net/aaaa/frontpage.jpg
•xxxx: serial number•yy: lower-level DNS•zzzz: top-level DNS•aaaa: fingerprint
ghost1467.ghosting.akamai.net
1. Determine client’s location (IP block)2. Top-level DNS server uses map to locate a close-by low-level DNS server
… set TTL to a relatively high value3. Client’s local DNS server contacts close-by low-level DNS server to request a lookup for
a surrogate server… set TTL to a relatively low value
Keep track of each server’s projected load …
Buddy system for servers
Lookups return list of servers
33
“Akamaized” content (VI)• Load data is circulated within each region
– … for each serial number
• Serial numbers are processed in increasing order of projected load– For each serial number, a random priority list of desired
servers is assigned• … using consistent hashing
– Each serial number is then resolved to the smallest initial segment of servers from the priority list so that no server becomes overloaded
• Initially, every serial number is mapped to every server.
• Iterative refinement of the assignments so as to balance the load with the minimum amount of replication
34
Akamai in action (1st request)
35
Akamai in action (subsequent requests)
36
CDN Reduced latency benefit • Bandwidth x Delay product: limit on outstanding packets
(in-flight, unacknowledged)• TCP: ~ 8 RTTs to fill 1Mbps pipe
– ~11 RTTs, by including DNS round-trip & TCP handshake 128 KB over 11 RTTs
– If RTT=60 msec (eg: US coast-to-coast), we need ~600 msec to fill-out the pipe
– If RTT=3 mscec (eg: nearby CDN node), we need ~30 msec to fill-out the pipe
37
The Black Art of “Network Mapping” (P. Danzig, 2001)
• Network mapping chooses reasonable data centers to satisfy a client request.
• Factors to consider:– Contracted data center bandwidth
– Path characteristics: RTT, Bottleneck Bandwidth, “Experience”, Autonomous Systems Crossed, Hop Count, Observed loss rates, etc.
– How do you measure these factors?
38
Network Mapping “Options” • Cisco Boomerang
– Synchronized DNS servers
• Radware’s DSLB box– Linear combination of hop count & RTT
• F5’s 3DNS– ICMP ping
• Alteon, Foundry, Resonate, ….• Akamai’s FirstPoint
– Lowers Yahoo’s response times by ~18%
39
Consistent Hashing• Hash function that maps URLs to a dynamic set of
available caches– A machine can locally compute exactly which cache should
contain a given object. • Push the page-location task down to the individual clients
– A unicast suffices to get the object or determine that it is not cached, decreasing network usage compared to multicast or directory schemes.
• It also discovers misses faster than multicast schemes that must wait for all caches to respond.
– It avoids the maintenance & query overhead associated with directory based schemes.
– It does not create new points of failure for the system
40
Break-down of Web traffic• GIF & JPEG: 55%• HTML: 25%• Misc: 20%• Delivering static HTML from caches is fast !• HTML: 1/3 static, 2/3 dynamic
– How can we make dynamic HTML faster ?
• EdgeSuite service:– Construct or “assemble” dynamic HTML within the CDN, via
proprietary language extensions • ESI (Edge-Side Includes): http://www.esi.org
– Akamai + Oracle initiative
41
Update of dynamic content
42
Live Streaming CDN
43
References• Akamai Inc: http://www.akamai.com• D. Karger, E. Lehman, T. Leighton, M. Levine, D. Lewin and R. Panigrahy, "Consistent
Hashing and Random Trees: Tools for Relieving Hot Spots on the World Wide Web“, Proc. ACM Symposium on Theory of Computation, 1997.
• D.R. Karger, A. Sherman, A. Berkheimer, W. Bogstad, R. Dhanidina, K. Iwamoto, B. Kim, L. Matkins, Y. Yerushalmi, " Web Caching with Consistent Hashing“, Proc. 8th WWW Conference, 1999.
• J. Diley et al, , “Globally Distributed Content Delivery”, IEEE Internet Computing, vol 6, no. 5, pp. 50-59, 2002.
• US Patent #6,108,703, Aug. 2000:– “Global hosting system”
• L. Kontothanassis, et al, “A Transport Layer for Live Streaming in a Content Delivery Network”, Proc. of the IEEE, vol. 92, no. 9, 2004
• A. Sherman, el al, “ACMS: The Akamai Configuration Management System”, Proc. USENIX NSDI, 2004.
• J. Jung, B. Krishnamurthy, M. Rabinovich, “Flash Crowds and Denial of Service Attacks: Characterization and Implications for CDNs and Web Sites”, Proc. 11 th WWW Conference, 2002.
top related