Top Banner
High Performance P2P Web Caching Erik Garrison Jared Friedman CS264 Presentation May 2, 2006
21

High Performance P2P Web Caching

Oct 14, 2014

Download

Documents

Jared Friedman

This was a presentation I did for Harvard's CS264. I think Erik and I had some really interesting ideas here. I don't have time to pursue them now, but I posted this in the hopes that it might inspire someone else.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: High Performance P2P Web Caching

High Performance P2P Web Caching

Erik GarrisonJared Friedman

CS264 PresentationMay 2, 2006

Page 2: High Performance P2P Web Caching

SETI@Home

● Basic Idea: people donate computer time to look for aliens

● Delivered more than 9 million CPU-years● Guinness BWR – largest computation ever● Many other successful projects (BOINC, Google

Compute)● The point: many people are willing to donate

computer resources for a good cause

Page 3: High Performance P2P Web Caching

Wikipedia

● About 200 servers required to keep the site live

● Hosting & Hardware costs over 1$M per year● All revenue from donations● Hard to make ends meet● Other not-for-profit websites in similar

situation

Page 4: High Performance P2P Web Caching

HelpWikipedia@Home

● What if people could donate idle computer resources to help host not-for-profit websites?

● They probably would!● This is the goal of our project

Page 5: High Performance P2P Web Caching

Prior Work

● This doesn't exist● But some things are similar

Content Distribution Networks (Akamai)● Distributed web hosting for big companies

CoralCDN/CoDeeN● P2P web caching, like our idea,● But a very different design● Both have some problems

Page 6: High Performance P2P Web Caching

Akamai, the opportunity

● Internet traffic is 'bursty'● Expensive to build infrastructure to handle

flash crowds● International audience, local servers

Sites run slowly in other countries

Page 7: High Performance P2P Web Caching

Akamai, how it works

● Akamai put >10,000 servers around the globe

● Companies subscribe as Akamai clients● Client content (mostly images, other media)

is cached on Akamai's servers● Tricks with DNS make viewers download

content from nearby Akamai servers● Result: Website runs fast everywhere, no

worries about flash crowds● But VERY expensive!

Page 8: High Performance P2P Web Caching

CoralCDN

● P2P web caching ● Probably the closest system to our goal● Currently in late-stage testing on PlanetLab● Uses an overlay and a 'distributed sloppy

hash table'● Very easy to use – just append '.nyud.net' to

a URL and Coral handles it● Unfortunately ...

Page 9: High Performance P2P Web Caching

Coral: Problems

● Currently very slow This might improve in later versions Or it might be due to the overlay structure

● Security: volunteer nodes can respond with fake data

● Any site can use Coral to help reduce load Just append .nyud.net to their internal links

● Decentralization makes optimization hard more on this later

Page 10: High Performance P2P Web Caching

Our Design Goals

● Fast: Akamai level performance● Secure: Pages served are always genuine● Fast updates possible● Must greatly reduce demands on main site

But this cannot compromise first 3

Page 11: High Performance P2P Web Caching

Our Design

● Node/Supernode structure Take advantage of extremely heterogeneous

performance characteristics● Custom DNS server redirects incoming

requests to nearby super node● Super node forwards request to nearby

ordinary node● Node replies to user

Page 12: High Performance P2P Web Caching

Our DesignUser goes to wikipedia.org

DNS server resolves wikipedia.org to a super node

Super node forwards request toordinary node that has the requested document

Node retrieves document and sends to user

Page 13: High Performance P2P Web Caching

Performance

● Requests are answered in only 2 hops● DNS server resolves to a geographically

close supernode● Supernode avoids sending requests to slow

or overloaded nodes● All parts of a page (e.g., html and images)

should be served by a single node

Page 14: High Performance P2P Web Caching

Security

● Have to check nodes' accuracy● First line of defense: encrypt local content● May delay attacks, but won't stop them

Page 15: High Performance P2P Web Caching

Security

● More serious defense: let users check the volunteer nodes!

● Add a javascript wrapper to the website that requests the pages using AJAX

● With some probability, the AJAX script will compute the MD5 of the page it got and send it to a trusted central node

● Central node kicks out nodes that frequently get invalid MD5sum's

● Offload processing not just to nodes, but to users, with zero-install

Page 16: High Performance P2P Web Caching

A Tricky Part

● Supernodes get requests, have to decide what node should answer what requests

● Have to load-balance nodes – no overloading● Popular documents should be replicated

across many nodes● But don't want to replicate unpopular

documents much – conserve storage space● Lots of conflicting goals!

Page 17: High Performance P2P Web Caching

On the plus side...

● Unlike Coral & CoDeeN, supernodes know a lot of nodes (maybe 100-1000?)

● They can track performance characteristics of each node

● Make object placement decisions from a central point

● Lots of opportunity to make really intelligent decisions Better use of resources Higher total system capacity Faster response times

Page 18: High Performance P2P Web Caching

Object Placement Problem

● This kind of problem is known as an object placement problem “What nodes do we put what files on?”

● Also related to the request routing problem “Given the files currently on the nodes, what

node do we send this particular request to?”● These problems are basically unsolved for

our scenario● Analytical solutions have been done for very

simplified, somewhat different cases● We suspect a useful analytic solution is

impossible here

Page 19: High Performance P2P Web Caching

Simulation

● Too hard to solve analytically, so do a simulation

● Goal is to explore different object placement algorithms under realistic scenarios

● Also want to model the performance of the whole system What cache hit ratios can we get? How does number/quality of peers affect cache

hit ratios? How is user latency affected?

● Built a pretty involved simulation in Erlang

Page 20: High Performance P2P Web Caching

Simulation Results

● So far, encouraging!● Main results using a heuristic object

placement algorithm● Can load-balance without creating hotspots

up to about 90% of theoretical capacity● Documents rarely requested more than once

from central server● Close to theoretical optimum

Page 21: High Performance P2P Web Caching

Next Steps

● Add more detail to simulation Node churn Better internet topology

● Explore update strategies● Obviously, an actual implementation would

be nice, but not likely to happen this week● What do you think?