Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES Hypertext Transfer Protocol Thanks to Jim Gettys, Digital Equipment Corporation, 1996 and James Marshall, 1997. •HTTP/1.1 Authors –Roy Fielding (UCI) –Jim Gettys - Editor (Digital ISBU / W3C) –Jeff Mogul (Digital / WRL) –Henrik Frysyk Nielsen (W3C) –Tim Berners-Lee (W3C) •IETF HTTP Working Group –Larry Masinter - Working Group Chair Tim Berners-Lee Rev. 1.05 / 14.01.2007 temporary location of course "Net Technologies": http://dims.karelia.ru/~alexmou/
36
Embed
Petrozavodsk State University, Alex Moschevikin, 2003NET TECHNOLOGIES Hypertext Transfer Protocol Thanks to Jim Gettys, Digital Equipment Corporation,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
Hypertext Transfer Protocol
Thanks to Jim Gettys, Digital Equipment Corporation, 1996 and James Marshall, 1997.
•IETF HTTP Working Group–Larry Masinter - Working Group Chair
Tim Berners-Lee
Rev. 1.05 / 14.01.2007
temporary location of course "Net Technologies": http://dims.karelia.ru/~alexmou/
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
HTTP and OSI RM
APPLICATION
PRESENTATION
SESSION
TRANSPORT
NETWORK
DATA LINK
PHYSICAL
Layer 7
Layer 6
Layer 5
Layer 4
Layer 3
Layer 2
Layer 1
HTTP
TCP
IP
Physical
TCP/IP OSI/RM
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
What is HTTP?
•Application-level protocol for distributed, collaborative, hypermedia information systems.
•Transaction-oriented client/server protocol.•HTTP uses TCP as transport basis.•Text-based commands and directives (not binary).•HTTP (original version) was a "stateless" protocol; each
transaction was treated independently. A typical implementation creates a new TCP connection between client and server for each transaction and then terminates the connection as soon as the transaction completes.
•Flexible in formats it can handle.
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
History of HTTP
•HTTP/0.9, 1990. Graphical user interface with hyper links to other information both text, graphics, sound, video etc. starts at “homepage”.
•1994 - population explosion on net with many countries providing access.
•HTTP/1.0 (RFC 1945, May 1996), the protocol was improved by allowing messages to be in the format of MIME-like messages, containing metainformation about the data transferred and modifiers on the request/response semantics.
WWW=HTTP
•However, HTTP/1.0 does not sufficiently take into consideration the effects of hierarchical proxies, caching, the need for persistent connections, or virtual hosts.
•HTTP/1.1 (RFC 2068, Jan. 1997), (RFC 2616, June 1999).
Screenshot of the firstversion of
Netscape Navigator, 1994
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
URI, URL, URN, difference
An Uniform Resource Locator (URL) is the term used to identify an Internet resource without the name specification (example, # anchor in HTML), and can be specified in a single line of text. There are more than 30 URI(URL)-schemes registered in IANA.
An Uniform Resource Name (URN) is the term used to identify an Internet resource, without the use of a scheme, and can be specified in a single line of text ("urn:isbn:n-nn-nnnnnn-n").
An Uniform Resource Identifier (URI) is the junction of URL and URN.
URI http://www.gleaners.org/faq.html#Q04 (#Q04 is not sent to http server)
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
Format of request or response
Both kinds of messages (request and response) consist of:•an initial line, •zero or more header lines, •a blank line (i.e. a CRLF by itself), and •an optional message body (e.g. a file, or query data, or query output).
Put another way, the format of an HTTP message is:
<initial line, different for request vs. response>Header1: value1Header2: value2Header3: value3
<optional message body goes here, like file contents or query data; it can be many lines long, or even binary data $&*%@!^$@>
Initial lines and headers should end in CRLF (0D 0A).
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
Request methods
Method Description
OPTIONS A request for information about the options available.
GET A request to retrieve information.
HEAD Like a GET except that the server's response must not include an entity body; all of the header fields in the response are the same as if the entity body were present. This enables a client to get information about a resource without transferring the entity body.
POST A request to accept the attached entity as a new subordinate to the identified URL.
PUT
DELETE
etc.
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
Initial request line
A request line has three parts, separated by spaces:•a method name,•the local path of the requested resource,•and the version of HTTP being used.
A typical request line is:
GET /path/to/file/index.html HTTP/1.0
Notes: •Method names are always uppercase. •The path is the part of the URL after the host name, also called the
request URI. •The HTTP version always takes the form "HTTP/x.x", uppercase.
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
Initial response line
The initial response line, called the status line, also has three parts separated by spaces:
•the HTTP version,•a response status code that gives the result of the request, and•an English reason phrase describing the status code.
Typical status lines are: HTTP/1.0 200 OK or HTTP/1.0 404 Not Found
Notes: The status code is a three-digit integer, and the first digit identifies the general category of response:
•1xx indicates an informational message only •2xx indicates success of some kind •3xx redirects the client to another URL •4xx indicates an error on the client's part •5xx indicates an error on the server's part
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
The most common status codes
The most common status codes are: 200 OK -- the request succeeded, and the resulting resource (e.g. file or script output) is returned in the message body. 404 Not Found -- the requested resource doesn't exist. 301 Moved Permanently 302 Moved Temporarily 303 See Other (HTTP 1.1 only) -- The resource has moved to another URL (given by the Location: response header), and should be automatically retrieved by the client. This is often used by a CGI script to redirect the browser to an existing file. 500 Server Error -- An unexpected server error. The most common cause is a server-side script that has bad syntax, fails, or otherwise can't run correctly.
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
Header lines
One line per header in the form of "Header-Name: value", ending with CRLF (RFC 822 format).HTTP 1.0 defines 16 headers, though none are required. HTTP 1.1 defines 46 headers, and one (Host:) is required in requests.
<h1>Happy New Millennium!</h1>(more file contents). . .
</body></html>After sending the response, the server closes the socket.
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
GET and POST (HTTP/1.0)
POST:
POST /path/script.cgi HTTP/1.0User-Agent: my_soft/1.0Content-Type: application/x-www-form-urlencodedContent-Length: 32
home=Cosby&favorite+flavor=flies
GET:
GET /path/script.cgi?home=Cosby&favorite+flavor=flies HTTP/1.0User-Agent: my_soft/1.0[blank line here]
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
URL-encoding
HTML form data is usually URL-encoded to package it in a GET or POST submission (RFC 2396).1. Convert all "unsafe" characters in the names and values to "%xx", where "xx" is the ascii value of the character, in hex. "Unsafe" characters include =, &, %, +, non-printable characters, and any others you want to encode. For simplicity, you might encode all non-alphanumeric characters. 2. Change all spaces to plusses. 3. String the names and values together with = and &, like name1=value1&name2=value2&name3=value3 4. This string is your message body for POST submissions, or the query string for GET submissions.
For example, if a form (in html document) has a field called "Number" that's set to "B52", and a field called "Text" that's set to "You & me", the URL-encoded form data would be Number=B52&Text=You+%26+me with a length of 21.
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
Features of HTTP/1.1
•Superset of HTTP 1.0.
Improvements:•Faster response, by allowing multiple transactions to take place
over a single persistent connection. •Faster response and great bandwidth savings, by adding cache
support. •Faster response for dynamically-generated pages, by supporting
chunked encoding, which allows a response to be sent before its total length is known.
•Efficient use of IP addresses, by allowing multiple domains to be served from a single IP address (virtual hosts).
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
HTTP/1.1 clients
To comply with HTTP 1.1, clients must:
•include the Host: header in each request;•accept responses with chunked data;•either support persistent connections, or include
"Connection: close" header with each request;•handle the "100 Continue" response.
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
Chunked Transfer-encoding
If a server wants to start sending a response before knowing its total length (like with long script output), it might use the simple chunked transfer-encoding, which breaks the complete response into smaller chunks and sends them in series.
A chunked message body contains a series of chunks, followed by a line with "0" (zero), followed by optional footers (just like headers), and a blank line.Each chunk consists of two parts:
•a line with the size of the chunk data, in hex, possibly followed by a semicolon and extra parameters you can ignore (none are currently standard), and ending with CRLF.
•the data itself, followed by CRLF.
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
1a; ignore-stuff-hereabcdefghijklmnopqrstuvwxyz101234567890abcdef0some-footer: some-valueanother-footer: another-value[blank line here]
Note the blank line after the last footer. The length of the text data is 42 bytes (1a + 10, in hex), and the data itself is abcdefghijklmnopqrstuvwxyz1234567890abcdef. The footers should be treated like headers, as if they were at the top of the response.
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
Persistent connections
Problem:In practice, most Web pages consist of several files on the same server. In HTTP 1.0 and before, TCP connections are closed after each request and response, so each resource to be retrieved requires its own connection. Opening and closing TCP connections takes a substantial amount of CPU time, bandwidth, and memory.
Solution:Much can be saved by allowing several requests and responses to be sent through a single persistent connection. Persistent connections are the default in HTTP 1.1, so nothing special is required to use them. Just open a connection and send several requests in series (called pipelining), and read the responses in the same order as the requests were sent. If a client includes the "Connection: close" header in the request, then the connection will be closed after the corresponding response.
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
The "100 Continue" response
On slow channels server might respond with an interim "100 Continue" response. This means the server has received the first part of the request.
To handle this, a simple HTTP 1.1 client might read one response from the socket; if the status code is 100, discard the first response and read the next one instead.
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
Web traffic compression
There are few methods of web traffic compression (gzip, deflate, compress etc.).The client asks the http server to use on of the supported compression algorithms, the server may send the requested document in compressed form. Decompression begins just after receiving the first bytes of http response (it is not necessary to receive all the document).
GET / HTTP/1.1host: www.google.comAccept-Encoding: gzip, deflate, compress
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
If-Modified-Since
To avoid sending resources that don't need to be sent, thus saving bandwidth, HTTP 1.1 defines the If-Modified-Since: and If-Unmodified-Since: request headers. The former says "only send the resource if it has changed since this date"; the latter says the opposite.
Clients aren't required to use them, but HTTP 1.1 servers are required to honor requests that do use them.
Unfortunately, due to earlier HTTP versions, the date value may be in any of three possible formats (1st - the most legal):
If-Modified-Since: Fri, 31 Dec 1999 23:59:59 GMT If-Modified-Since: Friday, 31-Dec-99 23:59:59 GMT If-Modified-Since: Fri Dec 31 23:59:59 1999
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
Caching documents
•Server side (for example, in case of dynamically generated pages)
•Client side (in local files on hard disk and memory)
•Intermediate http-proxies
•Not all transactions can be cached, and a client or server can dictate that a certain transaction may be cached only for a given time limit
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
Caching in HTTP
SERVERIncludes Date:, Expires: headers, or the max-age directive (server-specified expiration times and validators) into HTTP response.
PROXIES and CLIENTS
How do they know when to kill a certain document in cache or whether store it at all? Cache-Control: max-age=0Cache-Control: no-cacheCache-Control: must-revalidatePragma: no-cache (HTTP/1.0)…
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
Overall scheme (continued)
Web-SERVER
HTML-document to SERVER HTTP agent
Content-type: text/html
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"><HTML> <HEAD> <META HTTP-EQUIV="Cache-control" CONTENT="no-cache"> </HEAD><BODY>Thank you . . .</BODY></HTML>
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> <HTML>. . .</HTML>
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
Overall scheme (continued)
CLIENT HTTP-agent
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> <HTML>. . .</HTML>
Thank you . . . HTML-viewer
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
Secure HTTP (HTTPS)
HTTPS has the same functionality as HTTP, but uses encryption of data transferred to/from client and server (RFC 2660).HTTPS uses 443 TCP port as default.When connection to the secure port is established, the following happens automatically: • The client authenticates the server using the server's digital certificate • The client and server negotiate which cipher suite (set of security protocols) and generate session keys for encrypting and decrypting data.• The client and server establish a secure encrypted connection.
HTTPS has its own headers in HTTPS request/response and may, for example, encapsulate HTTP request/response (next slide).
Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES
Congratulations, you've won. <A href="/prize.html" CRYPTOPTS="Key-Assign: Inband,alice1,reply,des-ecb;020406080a0c0e0f; SHTTP-Privacy-Enhancements: recv-required=auth">Click here to claim your prize</A>
This HTTP response, encapsulated as an S-HTTP message becomes: