1 CPSC 826 Internetworking Applications & Application-Layer Protocols: The Web & HTTP Michele Weigle Department of Computer Science Clemson University [email protected]September 8, 2004 http://www.cs.clemson.edu/~mweigle/courses/cpsc826 2 Application-Layer Protocols Outline The architecture of distributed systems » Client/Server computing » P2P computing » Hybrid (Client/Server and P2P) systems The programming model used in constructing distributed systems » Socket programming Example client/server systems and their application-layer protocols » The World-Wide Web (HTTP) » Reliable file transfer (FTP) » E-mail (SMTP & POP) » Internet Domain Name System (DNS) local ISP company network regional ISP transport network link physical application 3 local ISP company network regional ISP transport network link physical application Applications and Application-Layer Protocols Overview Application: Communicating, distributed processes » Running in network hosts in “user space” » Exchange messages to implement application Application-layer protocols » One “piece” of an application » Defines messages exchanged and actions taken » Uses services provided by lower layer protocols transport network link physical application transport network link physical application 4 Application-Layer Protocols The Web User agent (client) for the Web is called a browser: » MS Internet Explorer » Mozilla Firefox » Apple Safari » Netscape Server for the Web is called a Web server: » Apache (public domain) » MS Internet Information Server (IIS)
11
Embed
Applications & Application-Layer Protocols: The Web & HTTPmweigle/clemson/courses/...Applications and Application-Layer Protocols Overview Application: Communicating, distributed processes
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
The architecture of distributed systems» Client/Server computing» P2P computing» Hybrid (Client/Server and P2P) systems
The programming model used in constructingdistributed systems» Socket programming
Example client/server systems andtheir application-layer protocols» The World-Wide Web (HTTP)» Reliable file transfer (FTP)» E-mail (SMTP & POP)» Internet Domain Name System (DNS)
local ISP
companynetwork
regional ISP
applicationtransportnetwork
linkphysical
application
3
local ISP
companynetwork
regional ISP
applicationtransportnetwork
linkphysical
application
Applications and Application-Layer ProtocolsOverview
Application:Communicating, distributedprocesses» Running in network hosts in
“user space”» Exchange messages to
implement application Application-layer protocols
» One “piece” of an application» Defines messages exchanged
and actions taken» Uses services provided by
lower layer protocols
applicationtransportnetwork
linkphysical
application
applicationtransportnetwork
linkphysical
application
4
Application-Layer ProtocolsThe Web
User agent (client) forthe Web is called abrowser:» MS Internet Explorer» Mozilla Firefox» Apple Safari» Netscape
Server for the Web iscalled a Web server:» Apache (public domain)» MS Internet Information
Server (IIS)
5
Application-Layer ProtocolsWeb terminology
Web page:» Addressed by a URL» Consists of “objects”
Most Web pages consist of:» Base HTML page» Embedded objects<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html lang="en"><head> <meta http-equiv="content-type" content="text/html; charset=iso-8859-1"> <title>CNN.com</title> <meta http-equiv="refresh" content="1800; URL=http://www.cnn.com/?"> <link rel="StyleSheet" href="http://i.cnn.net/cnn/virtual/2001/style/main.css" type="text/css"> <script language="JavaScript1.1" src="http://i.cnn.net/cnn/virtual/2000/code/main.js" type="text/javascript"> </script> <script language="JavaScript1.1" type="text/javascript"> </script><script language="JavaScript1.1" src="http://ar.atwola.com/file/adsWrapper.js"></script><style type="text/css"></style><script language="JavaScript">document.adoffset=0</script></head>
<body class="cnnMainBody" bgcolor="#FFFFFF">
<a name="top_of_page"></a> : :
6
Web TerminologyURLs (Universal Resource Locators)
www.someSchool.edu:8080/someDept/pic.gif
Server domain name Object path name
Optional server port (Default = port 80)
URL components» Server address» (Optional port number)» Path name
7
Web TerminologyThe Hypertext Transfer Protocol (HTTP)
Web’s application layerprotocol
Client/server model» client: browser that
requests, receives,“displays” Web objects
» server: Web server sendsobjects in response torequests
PC runningFirefox
ServerrunningApache
Mac runningSafari
HTTP request
HTTP request
HTTP response
HTTP response
HTTP/1.0: RFC 1945 HTTP/1.1: RFC 2616
8
The Hypertext Transfer Protocol HTTP Overview
HTTP uses TCP sockets» Browser initiates TCP
connection to server (on port 80) HTTP messages (application -
layer protocol messages)exchanged between browserand Web server
HTTP/1.0: RFC 1945» One request/response
interaction per connection HTTP/1.1: RFC 2616
» Persistent connections» Pipelined connections
HTTP is “stateless”» Server maintains no
information aboutpast browser requests
Protocols that maintain “state”are complex!» Past history (state) must be
maintained» If server or client crashes,
their views of “state” maybe inconsistent and mustbe reconciled
aside
9
The Hypertext Transfer ProtocolHTTP example User enters URL www.someSchool.edu/someDept/home.index
» Referenced object contains HTML text and references10 JPEG images
Browser sends an HTML “GET” request to the serverwww.someSchool.edu
WebServer
Browser
HTTP request1
HTTP response1 Server will retrieve and
send the HTML file Browser will read the file
and sequentially make 10separate requests for theembedded JPEG images HTTP request11
HTTP Protocol DesignNon-persistent connections The default browser/server behavior in HTTP/1.0 is
for the connection to be closed after the completion ofthe request» Server parses request, responds, and closes TCP connection» The Connection: keep-alive header allows for
persistent connections
WebServer
Browser
TCP connection
establishment With non-persistentconnections at least 2 RTTs arerequiredto fetch every object» 1 RTT for TCP handshake» 1 RTT for request/response
HTTP request
HTTP response
21
Non-Persistent ConnectionsPerformance
A
B
propagationtransmission
nodalprocessing
queueing
WebServer
Browser
TCP connection
establishment
HTTP request
HTTP response
With non-persistentconnections at least 2 RTTs arerequired to fetch every object» 1 RTT for TCP handshake» 1 RTT for request/response
22
Non-Persistent ConnectionsPerformance
A
B
propagationtransmission
nodalprocessing
queueing
Example: A 1 Kbyte base page with five 1.5 Kbyteembedded images coming from the West coast on anOC-48 link» 1 RTT for TCP handshake = 0.004 ms + 50 ms» 1 RTT for request/response = 0.006 ms + 50 ms
Page download time with non-persistent connections? Page download time with a persistent connection?
23
Non-Persistent ConnectionsParallel connections To improve performance a browser can issue multiple
requests in parallel to a server (or servers)» Server parses request, responds, and closes TCP connection
WebServer
Browser
TCP connection
establishment
HTTP request
HTTP responseWeb
Server
TCP connectionestablishmentHTTP request
HTTP response
Page download time with parallel connections?» 2 parallel connections =» 4 parallel connections =
24
HTTP Protocol DesignPersistent v. non-persistent connections
Non-persistent» HTTP/1.0» Server parses request, responds, and closes TCP connection» At least 2 RTTs to fetch every object
Persistent» Default for HTTP/1.1 (negotiable in 1.0)» Client sends requests for multiple objects on one TCP connection» Server, parses request, responds, parses next request, responds...» Fewer RTTs
Parallel v. persistent connections?
25
Persistent ConnectionsPersistent connections with pipelining
Persistent without pipelining: Client issues new request only when previous
response has been received One RTT for each referenced object
Persistent with pipelining: Default in HTTP/1.1 Client sends requests as soon as it encounters a
referenced object As little as one RTT for all the referenced objects
26
Persistent ConnectionsWithout Pipelining
HTTP request msg
base HTTP response msg
HTTP request msg(1st embedded object)
HTTP response msg(1st embedded object)
HTTP request msg(2nd embedded object)
HTTP response msg(2nd embedded object)
Client Server
Time
Client issues newrequest only whenprevious response hasbeen received
One RTT for eachreferenced object
27
Persistent ConnectionsWith Pipelining
HTTP request msg
base HTTP response msg
HTTP request msg(1st embedded object)
HTTP response msg(1st embedded object)
HTTP request msg(2nd embedded object)
HTTP response msg(2nd embedded object)
Client Server
Time
Default in HTTP/1.1 Client sends requests
as soon as itencounters areferenced object
As little as one RTTfor all the referencedobjects
28
HTTP User-Server InteractionAuthentication Problem: How to limit
access to server documents?» Servers provide a means to
require users to authenticate themselves HTTP includes a header tag
for user to specify name andpassword (on a GET request)» If no authorization presented,
server refuses access, sendsWWW authenticate:header line in response
Stateless: client must sendauthorization for each request» A stateless design» (But browser may cache credentials)
usual HTTP request msg
401: authorizationWWW authenticate:
usual HTTP request msg+ authorization:
usual HTTP response msg
usual HTTP request msg+ authorization:
usual HTTP response msg
Client Server
Time
29
HTTP User-Server InteractionCookies
Server sends “cookie”to browser in responsemessageSet-cookie:<value>
Browser presents cookie inlater requests to same servercookie: <value>
Server matches cookie withserver-stored information» Provides authentication» Client-side state main-tenance
(remembering userpreferences, previous choices,…)
usual HTTP request msg
usual HTTP response +Set-cookie: S1
usual HTTP request msgcookie: S1
usual HTTP request msgcookie: S1
cookie-specificaction
cookie-specificaction
usual HTTP response msg
usual HTTP response +Set-cookie: S2
Client Server
30
HTTP User-Server InteractionBrowser caches
Internet
browserorigin server
miss
origin serverBrowser withdisk cache
Internethit
Browsers cache content from servers to avoid futureserver interactions to retrieve the same content
31
HTTP User-Server InteractionThe conditional GET If object in browser cache
is “fresh,” the server won’tre-send it» Browsers save current date
along with object in cache Client specifies the date of
cached copy in HTTPrequestIf-modified-since:<date>
Server’s response containsthe object only if it hasbeen changed since thecached date
Otherwise server returns:HTTP/1.0 304 Not Modified
HTTP requestIf-modified-since:
<date>
HTTP responseHTTP/1.0
304 Not Modified
object not
modified
HTTP requestIf-modified-since:
<date>
HTTP responseHTTP/1.0 200 OK
…<data>
object modified
Client Server
32
HTTP User-Server InteractionCache Performance for HTTP Requests
What is the average time to retrieve a web object?» Tmean = hit ratio x Tcache + (1 – hit ratio) x Tserver
where hit ratio is the fraction of objects found in the cache» Mean access time from a disk cache =» Mean access time from the origin server =
For a 60% hit ratio, the mean client access time is:» (0.6 x 10 ms) + (0.4 x 1,000 ms) = 406 ms
Origin ServerBrowser withdisk cache
CacheMiss
Cache Hit Network
33
Cache Performance for HTTP RequestsWhat determines the hit ratio?
Cache size Locality of references
» How often the same web object is requested How long objects remain “fresh” (unchanged)
Object references that can’t be cached at all» Dynamically generated content» Protected content» Content purchased for each use» Content that must always be up-to-date» Advertisements (“pay-per-click” issues)
34
A Historical Digression on the WebVan Jacobson’s Web flame
35
The Impact of Web Traffic on the InternetMCI backbone traffic in bytes by protocol (1998)
36
Traffic Makeup on UNC LinkInbound traffic
Port 412 = file sharing Port 7668, 6349 = You tell me!!
37
Caching on the WebWeb caches (Proxy servers)
Users configure browsers tosend all requests through ashared proxy server» Proxy server is a large
cache of web objects
Web caches are used to satisfy client requests withoutcontacting the origin server
HTTP request
HTTP requestHTTP response (hit)
HTTP response
HTTP response
client Proxyserver
client
Origin server
Open research question:How does the proxy hitratio change with thepopulation of users sharingit?
HTTP request (miss)
Browsers send all HTTPrequests to proxy» If object in cache, proxy
returns object in HTTPresponse
» Else proxy requests objectfrom origin server, thenreturns it in HTTP responseto browser
HTTP response
38
Why do Proxy Caching?The performance implications of caching
Consider a cache that is“close” to client» E.g., on the same LAN
Nearby caches mean:» Smaller response times» Decreased traffic on egress
link to institutional ISP(often the primarybottleneck)
To improve Web response timesshould one buy a 10 Mbps
access link or a proxy server?
originservers
campusnetwork
1.5 Mbpsaccess link
10 Mbps LAN
publicInternet
proxyserver
39
Why do Proxy Caching?The performance implications of caching
Web performance without caching:» Mean object size = 50 Kbits» Mean request rate = 29/sec» Mean origin server access time = 1
sec
originservers
campusnetwork
1.5 Mbpsaccess link
10 Mbps LAN
reqssec29 50 Kbits/req
1.5 MbpsX = 0.97
» Average response time = ??
Traffic intensity on the accesslink:
publicInternet
1000
ms
40
Why do Proxy Caching?The performance implications of caching
Upgrade the access link to 10 Mb/s» Response time = ??» Queuing is negligible hence response time =
1 sec Add a proxy cache with 40% hit ratio
and 10 ms access time» Response time = ??» Traffic intensity on access link =
originservers
campusnetwork
1.5 Mbpsaccess link
10 Mbps LAN
0.4 x 10 ms + 0.6 x 1,000 ms = 604 ms
0.6 x 0.97 = 0.58» Response time =
A proxy cache lowers response time,lowers access link utilization, and saves money!
publicInternet
1000
ms
41
Why do Proxy Caching?The case for proxy caching
Lower latency for user’s webrequests
Reduced traffic at all network levels Reduced load on servers Some level of fault tolerance
(network, servers) Reduced costs to ISPs, content
providers, etc., as web usagecontinues to grow exponentially