Design, Implementation and Evaluation of a Client Characterization Driven Web Server Balachander Krishnamurthy – AT&T Balachander Krishnamurthy – AT&T Research Labs Research Labs Craig Wills – WPI Craig Wills – WPI Yin Zhang – AT&T Research Labs Yin Zhang – AT&T Research Labs Kashi Vishwanath Kashi Vishwanath – Duke University – Duke University
24
Embed
Design, Implementation and Evaluation of a Client Characterization Driven Web Server Balachander Krishnamurthy – AT&T Research Labs Balachander Krishnamurthy.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Design, Implementation and Evaluation of a Client
Characterization Driven Web Server
Balachander Krishnamurthy – AT&T Research LabsBalachander Krishnamurthy – AT&T Research Labs
Craig Wills – WPICraig Wills – WPI
Yin Zhang – AT&T Research LabsYin Zhang – AT&T Research Labs
Kashi VishwanathKashi Vishwanath – Duke University – Duke University
• Users with varying connectivity request a variety of files• Web site wants no user to get disinterested Web site wants no user to get disinterested • Handle requests based on connectivityHandle requests based on connectivity
Motivation
End Users
Web server
Outline of our solution
Figure out connectivity : use web server logsFigure out connectivity : use web server logs
Classify clients(IPs) :store in shared memoryClassify clients(IPs) :store in shared memory Cluster: group of IPs [KW00]Cluster: group of IPs [KW00]
When a page is requested, the Web server: When a page is requested, the Web server: Identifies client connectivity Identifies client connectivity Takes action if required: alternate content etc.Takes action if required: alternate content etc.
What is a page access? User clicks: User clicks: container pagecontainer page page eg. /foo.html page eg. /foo.html
Subsequent requests : Subsequent requests : embedded objectsembedded objects eg. /img1.gif eg. /img1.gif At server : how long before the user At server : how long before the user
starts getting the first object ?:E_first starts getting the first object ?:E_first gets the entire page ?: E_last gets the entire page ?: E_last
Smoothed means for popular pagesSmoothed means for popular pages Classes (poor, rich, normal) – [KW02]Classes (poor, rich, normal) – [KW02]
Poor : E_first >3 and E_last >5Poor : E_first >3 and E_last >5 Rich : E_first <=1 and E_last <=2Rich : E_first <=1 and E_last <=2
Modified Apache Server
Modification to Apache 1.3.24Modification to Apache 1.3.24 three files, less than 200 linesthree files, less than 200 lines
http_main.c, http_protocol.c, http_core.chttp_main.c, http_protocol.c, http_core.c One time overhead One time overhead
Open cluster libraryOpen cluster library Read in configuration fileRead in configuration file
Modified Apache Server :Per Request Changes Per Request ChangesPer Request Changes
Is it a popular page ( URI ) ?Is it a popular page ( URI ) ? Lookup class (in shared memory data) Lookup class (in shared memory data)
Cluster lookup if IP unavailableCluster lookup if IP unavailable Tailored action if appropriate and log itTailored action if appropriate and log it
Testing: How to choose test pages
Proxy logs from a large manufacturing company (Dec Proxy logs from a large manufacturing company (Dec 2001)2001)
Over 100,000 usersOver 100,000 users Select 1000 most popular pagesSelect 1000 most popular pages Download these : April 2002Download these : April 2002 641 successful pages reconstructed641 successful pages reconstructed 33% and 66% percentile value for characteristics33% and 66% percentile value for characteristics
Container bytes, # of Embedded object, embedded Container bytes, # of Embedded object, embedded bytesbytes
3 x 3 x 3 = 27 buckets of pages3 x 3 x 3 = 27 buckets of pages
Objects Small Medium Large Small Medium Large Small Medium Large
Small 20 6 2 4 1 0 0 0 0
Medium 2 3 1 5 7 8 2 5 3
Large 0 0 0 1 2 4 1 8 14
Server Overhead : Latency increase at clientGenerate IPs (X-IP header)Generate IPs (X-IP header)
Overhead (usec)
Step Mean Median Std.dev.
Is URI a container document ? 19.2 2 12.1
Class lookup in shared memory 12.5 9 5.8
Cluster related overhead
Converting IP address 4.2 3 6.1
Looking up cluster 8 7 4.8
Cluster lookup in shared memory 2.5 2 4.9
Classification based on cluster 0.7 0 4.4
Server Actions
Modifying URI 25.5 25 6.1
Logging changed request 2.8 3 3.6
Total overhead 75.4 51 18.2
0
2
4
6
8
10
12
14
1 5 10 50 100 200 400 600 800 1000
Number of concurrent connections
Ave
rag
e se
rver
ove
rhea
d %
Plot of average increase in processing time Plot of average increase in processing time for the modified serverfor the modified server
Stress Test
Placing Clients and Modified Servers Prototype Apache Server with our test sitePrototype Apache Server with our test site
Linux – att.com in NJ, USALinux – att.com in NJ, USA Linux – wpi.com MA, USALinux – wpi.com MA, USA FreeBSD – icir.org CA,USAFreeBSD – icir.org CA,USA
ClientsClients att:AT&T Labs-Research, NJ,USAatt:AT&T Labs-Research, NJ,USA de: Saarbruecken University, Germanyde: Saarbruecken University, Germany cable: cable modem user, NJ,USAcable: cable modem user, NJ,USA modem: 56Kbps dialup modem user, NJ,USAmodem: 56Kbps dialup modem user, NJ,USA uk:London, U.K via a dedicated 56Kbps line.uk:London, U.K via a dedicated 56Kbps line.
0
100
200
300
400500
600
700
800
900
0 20 40 60 80 100
Throughput (KB/sec)
Roun
d-Tr
ip T
ime
(ms)
Clients:Observed network characteristics
uk-icir
uk-att
cable-icir
uk-wpi
de-att att-iciratt-wpi
Spans a wide spectrum
Experiments :Httperf Clients Request similar mix of pages as described earlier Request similar mix of pages as described earlier 200 random requests each200 random requests each New Headers: X-Server-Actions, X-ClassNew Headers: X-Server-Actions, X-Class Baseline measureBaseline measure
parallel-1.0 – up to 4 parallel HTTP/1.0 requestsparallel-1.0 – up to 4 parallel HTTP/1.0 requests Server actions Server actions
Manner of deliveryManner of deliveryCompress, serial-1.1, pipeline-1.1, bundleCompress, serial-1.1, pipeline-1.1, bundle
Server Action Benefitremoving embedded objects all caseshalf resoultion modem clientcompression except well-connectedbundling better connected clients,
large latency for poor clientsbundle.gz except better-connected clientsserial-1.1 neverpipe-1.1 high throughput or RTT
Conclusions
What all do we do ?What all do we do ? Online client classificationOnline client classification Deliver modified server actionsDeliver modified server actions Measure latency reduction to different clientsMeasure latency reduction to different clients Compare various actionsCompare various actions
First to do this in a unified framework.First to do this in a unified framework.
Conclusions:Server and Classifier
Overhead at serverOverhead at server Average:75 usec. Negligible for end-userAverage:75 usec. Negligible for end-user Turn off classification during overloadTurn off classification during overload Server poor-content during overloadServer poor-content during overload
ClassificationClassification Close to expectationClose to expectation Stable over the duration of experimentStable over the duration of experiment Improve by using select pages and better Improve by using select pages and better
thresholdsthresholds
Future
Other server actionsOther server actions Delta encodingDelta encoding Policies regarding cacheablility of objectsPolicies regarding cacheablility of objects
Create a test for clusteringCreate a test for clustering
Acknowledgement
Client testingClient testing Saarbruecken, GermanySaarbruecken, Germany ICIR, USAICIR, USA
Proxy logs - Manufacturing companyProxy logs - Manufacturing company
Thanks!
Questions ?Questions ?
Slides: http://www.cs.duke.edu/~kvv/www.ppt
Related Work
Mark network packets [NT02]Mark network packets [NT02] Improve performance of Improve performance of allall clients clients
Adapt content based on Adapt content based on server load server load [AB99][AB99] User’s expectation [BBK00]User’s expectation [BBK00] Alternate admission control and server scheduling Alternate admission control and server scheduling