CIS 5930-04 – Spring 2001

[email protected] 1

CIS 5930-04 – Spring 2001

http://aspen.csit.fsu.edu/it1spring01

Instructors: Geoffrey Fox , Bryan CarpenterComputational Science and Information

TechnologyFlorida State University

Acknowledgements: Nancy McCrackenSyracuse University

Part 6: Introduction to CGI and Servlets

[email protected] 2

Introduction

RMI gave us one approach to client/server programming.

The approach was based on the Java language and some far-reaching ideas about remote objects, object serialization, and dynamic class loading.

We could achieve direct integration into the traditional World Wide Web through applets, but the technology is not specifically tied to the Web.

RMI is powerful and general (and interesting), but it can be a slightly heavy-handed approach if actually we only need to interact with users through Web pages.

For the future, it may be more natural to view RMI as a technology for the “middle tier” (or for connectivity in the LAN) rather than for the Web client.

[email protected] 3

There are long-established techniques for getting information from users through Web browsers (predating the appearance of Java on the Web).

The FORM element of HTML can contain a variety of input fields.

The inputted data is harvested by the browser, suitably encoded, and forwarded to the Web server.

On the server side, the Web server is configured to execute an arbitrary program that processes the user’s form inputs.

This program typically outputs a dynamically generated HTML document containing an appropriate response to the user’s input.

The server-side mechanism is called CGI: Common Gateway Interface.

HTML Forms and CGI

[email protected] 4

CGI and Servlets

In conventional CGI, a Web site developer writes the executable programs that process form inputs in a language such as Perl or C.

The program (or script) is executed once each time a form is submitted.

Servlets provide a more modern, Java-centric approach.

The server incorporates a Java Virtual Machine, which is running continuously.

Invocation of a CGI script is replaced invocation of a method on a servlet object.

[email protected] 5

Advantages of Servlets

Invocation of a single Java method is typically much cheaper than starting a whole new program. So servlets are typically more efficient than CGI scripts.– This is important if we planning to centralize processing

in the server (rather than, say, delegate processing to an applet or browser script).

Besides this we have the usual advantages of Java:– Portability,– A fully object-oriented environment for large-scale

program development.– Library infrastructure for decoding form data, handling

cookies, etc (although many of these things are also available in Perl).

– Servlets are the foundation for Java Server Pages.

[email protected] 6

Plan of this Lecture Set

Review HTML forms and associated HTTP requests.

Briefly describe traditional CGI programming. Detailed discussion of Java servlets:

– Deploying Tomcat as a standalone Web server.– Simple servlets.– The servlet life cycle.– Servlet requests and responses. More on the HTTP

protocol.– Approaches to session tracking. Handling cookies.– The servlet session-tracking API.

[email protected] 7

References

Core Servlets and JavaServer Pages, Marty Hall, Prentice Hall, 2000.– Good coverage and current, with some

discussion of the Tomcat server.

Java Servlet Programming, Jason Hunter and William Grawford, O’Reilly, 1998.– Also good, with some good examples. Slightly

out of date.

Java Servlet Specification, v2.2, and other documents, at: http://java.sun.com/products/servlet/

[email protected] 8

HTML Forms

[email protected] 9

The HTTP GET request

Before discussing forms, let’s look again at how the GET request normally works.

The following server program listens for HTTP requests, and simply prints the received request to the console.

[email protected] 10

A Dummy Web Server

public class DummyServer { public static void main(String [] args) throws Exception

{

ServerSocket server = new ServerSocket(8080) ; while(true) { Socket sock = server.accept() ;

BufferedReader in = new BufferedReader( new

InputStreamReader(sock.getInputStream())) ;

String method = in.readLine() ; System.out.println(method) ; while(true) { String field = in.readLine() ; System.out.println(field) ; if(field.length() == 0) break ; } . . . Send a dummy response to client socket . . . }}


A GET Request

On the host sirah I run the dummy server: sirah$ java DummyServer

Now I point a browser at http://sirah.csit.fsu.edu:8080/index.html

The dummy server program might print: GET /index.html HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.51 [en] (X11; I; ...) Host: sirah.csit.fsu.edu:8080 Accept: image/gif, ..., */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8

<blank line>


Fields of the GET request

The HTTP GET request consists of a series text fields on separate lines, ended by an empty line.

The first line is the most important: it is called the method field.

In simple GET requests, the second token in the method line is the requested file name, expressed as a path relative to the document root of the server.


A Simple HTML Form

The form element includes one or more input elements, along with any normal HTML terms:

<html> <body> <form method=get

action=“http://sirah.csit.fsu.edu:8080/dummy”> Name: <input type=text name=who

size=32> <p> <input type=submit> </form> </body></html>


Remarks

The form tag includes important attributes method and action.

The method attribute defines the kind of HTTP request sent when the form is submitted: its value can be get or post (see later).

The action attribute is a URL. In normal use it will locate an executable program on the server. In this case it is a reference to my “dummy server”.

An input tag with type attribute text represents a text input field.

An input tag with type attribute submit represents a “submit” button.


Displaying the Form

If I place this HTML document on a Web Server at a suitable location, and visit its URL with a browser, I see something like:


Submitting the Form

If I type my name, and click on the “Submit Query”button, the dummy server running on sirah prints:

GET /dummy?who=Bryan HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.51 [en] (X11; I; ...) Host: sirah.csit.fsu.edu:8080 Accept: image/gif, ..., */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8

<blank>


Remarks

When the form specifying the get method is submitted, the values inputted by the user are effectively appended to the end of the URL specified in the action attribute.

In the HTTP GET request—sent when the submit button is pressed—they appear attached to the second token of the first line of the request.

In simple cases the appended string begins with a ? This is followed by pairs of the form name=value,

where name is the name appearing in the name attribute of the input tag, and value is the value entered by the user.

If the form has multiple input fields, the pairs are separated by &


POST requests

This method of attaching input data to the URL is handy if the user has a relatively simple query (e.g. for a search engine).

For more complex forms it is usually recommended to specify the post method in the form tag, e.g.: <form method=post

action=“http://sirah.csit.fsu.edu:8080/dummy”> In the HTTP protocol, a POST request differs

from a GET request by having some data appended after the headers.


A Form Using the POST Method

<form method=post action=“http://sirah.csit.fsu.edu:8080/dummy”> Surname: <input type=text name=surname size=32> <p> Surname: <input type=text name=fornames

size=40> <p> <input type=submit></form>


Extending the Dummy Server

We can modify the dummy server to display POST requests, by declaring a variable contentLength, adding the lines

if(field.stubstring(0, 16).equalsIgnoreCase(“Content-Length: ”)) ;

contentLength = Integer.parseInt(field.substring(16)) ;

inside the loop that reads the headers, and adding

for(int i = 0 ; i < contentLength ; i++) int b = in.read() ; System.out.println((char) b) ; }

after that loop.


Submitting the Form

When I click on the “Submit Query” button, the dummy server prints:

POST /dummy HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.51 [en] (X11; I; ...) Host: sirah.csit.fsu.edu:8080 Accept: image/gif, ..., */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 Content-type: application/x-www-form-urlencoded Content-Length: 39

surname=Carpenter&forenames=David+Bryan


Remarks

The method field (the first line) now starts with the word POST instead of GET; the data is not appended to the URL.

There are a couple more fields in the header, describing the format of the data.

Most importantly, the form data is now on a separate line at the end of file.

However, the form data is still URL-encoded.


URL Encoding

URL encoding is a method of wrapping up form-data in a way that will make a legal URL for a GET request.

We have seen that the encoded data consists of a sequence of name=value pairs, separated by &.

In the last example we saw that spaces are replaced by +.

Non-alphanumeric characters are converted to the form %XX, where XX is a two digit hexadecimal code.

In particular, line breaks in multi-line form data (e.g. addresses) become %0D%0A—the hex ASCII codes for a carriage-return, new-line sequence.

URL encoding is somewhat redundant for the POST method, but it is the default anyway.


More Options for the input Tag

We can make a group of radio buttons in an HTML form by using a set of input tags with the type attribute set to radio.

Tags belonging to the same button group should have the same name attribute, and distinct value attributes, e.g.:

<form method=post action=“http://sirah.csit.fsu.edu:8080/dummy”> Favorite primary color: <p> Red: <input type=radio name=color value=red> Blue: <input type=radio name=color value=blue> Green: <input type=radio name=color

value=green> <p> <input type=submit></form>


Radio Buttons

The message sent to the server is: ... Content-type: application/x-www-form-urlencoded Content-Length: 10

color=blue


Checkboxes

<form method=post action=“http://sirah.csit.fsu.edu:8080/dummy”> What pets do you own? <p> <input type=checkbox name=pets value=dog

checked> Dog <br> <input type=checkbox name=pets value=cat> Cat

<br> <input type=checkbox name=pets value=bird>

Bird <br> <input type=checkbox name=pets value=fish>

Fish <p> <input type=submit></form>

Example from “HTML and XHTML: The Definitive Guide”, O’Reilly.


Checkboxes

The message posted to the server is: ...

pets=dog&pets=bird

Note there is no requirement that a form map a name to a unique value.


File-Selection

You can name a local file in an input element, and have the entire contents of the file posted by browser to server.

This is not allowed using the default URL-encoding for form data. Instead you must specify multi-part MIME encoding in the form element, e.g.:

<form method=post enctype=“multipart/form-data”

action=“http://sirah.csit.fsu.edu:8080/dummy”>

Course: <input name=course size=20> <p> Students file: <input type=file name=students



File-Selection Entry

With multi-part encoding, the data is no longer sent on a single line.

On submission the DummyServer prints. . .


Output of DummyServer on submitPOST /dummy HTTP/1.0Referer: http://sirah.csit.fsu.edu/users/dbc/forms/form5.html...Content-type: multipart/form-data;

boundary=---------------------------269912718414714Content-Length: 455

-----------------------------269912718414714Content-Disposition: form-data; name="course"

CIS6930-----------------------------269912718414714Content-Disposition: form-data; name="students"; filename="students"

wcaofloraFulaygao...zhao6930zheng

-----------------------------269912718414714--


Remarks

Each form field has its own section in the posted file, separated by a delimiter specified in the Content-type field of the header.

Within each section there are one or more header lines, followed by a blank line, followed by the form data.

The values can contain binary data. There is no “URL-encoding”.


Masked and Hidden fields

The input to a text field can be masked by setting the type attribute to password. The entered text will not be echoed to the screen.

If the type attribute is set to hidden, the input field is not displayed at all. This kind of field is often used in HTML forms dynamically generated by CGI scripts.

Hidden fields allow the CGI scripts to keep track of “session” information over an interaction that involves multiple forms—hidden fields may contain values characterizing the session.– Use of hidden fields will be one of the topics in the

lectures on servlets.


Text Areas

Similar to text input fields, but allow multi-line input.

Included in a form by using the textarea tag, e.g.:

<textarea name=address cols=40 rows=3> . . . optional default text goes here . . .

</textarea>

With default (URL) encoding, lines of input are separated by carriage return/newline, coded as %0D%0A.


Text Area Input

Data posted to server:

address=Bryan+Carpenter%0D%0ACSIT%2C+FSU%0D%0ATallahassee%2C+FL+32306-4120


Scrollable Menus (Lists)

For long lists of options, when checkboxes become too tedious:<select name=pets size=3 multiple> <option value=dog> Dog <option value=cat> Cat <option value=bird> Bird <option value=fish> Fish</select>

The value attribute in the option tag is optional: default value returned is the displayed string, immediately following the tag.

Without the multiple attribute, only a single option can be selected.


List Input

The message posted to the server is: ...

pets=dog&pets=bird


Conventional CGI


Handling Form Data on the Server

In conventional CGI programming, the URL in the action attribute of a form will identify an executable file somewhere in the Web Server’s document hierarchy.

A common server convention is that these executables live in a subdirectory of cgi-bin/

The executable file may be written in any language. For definiteness we will assume it is written in Perl, and refer to it as a CGI script.

The Web Server program will invoke the CGI script, and pass it the form data, either through environment variables or by piping data to standard input of the script.

The CGI script generates a response to the form, which is piped to the Web server through its standard output, then returned to the browser.


Operation of a CGI Script

At the most basic level, a CGI script must – Parse the input (the form data) from the server, and– Generate a response.

Most often the response is the text of a dynamically generated HTML document, preceded by some HTTP headers.– In practice the only required HTTP header is the

Content-type header. The Web Server will fill in other necessary headers automatically.

Even if there is no meaningful response to the input data, the CGI script must output an empty message, or some error message.– Otherwise the server will not close the connection to

the client, and a browser error will occur.


“Hello World” CGI Script

In the directory /home/httpd/cgi-bin/users/dbc on sirah, I create the file hello.pl, with contents: #!/usr/bin/perl

print “Content-type: text/html\n\n” ;

print “<html><body><h1>Hello World!</h1></body></html>” ;

I mark this file world readable, and mark it executable: sirah$ chmod o+r hello.pl

sirah$ chmod +x hello.pl

Now I point my browser at the URL: http://sirah/cgi-bin/users/dbc/hello.pl


Output from CGI Script

The novel feature here is the the HTML was dynamically generated: it was printed out on the fly by the Perl script.


Retrieving Form Data Several environment variables are set up by the server

to pass information about the request to the Perl script.

If the form data was sent using a GET request, the most important is QUERY_STRING, which contains all the text in the URL following the first ? character.

If the form data was sent using a POST request, the environment variable CONTENT_LENGTH contains the length in bytes of the posted data. To retrieve this data, these bytes are read from the standard input of the script.


GET example I change our first form to submit data to a CGI script:

<form method=get

action=“http://sirah.csit.fsu.edu/cgi-bin/users/dbc/getEg.pl”>

Name: <input type=text name=who size=32> <p> <input type=submit></form>

and define getEg.pl by:#!/usr/bin/perlprint “Content-type: text/html\n\n” ;print “<html><body><h1>Hello $ENV{QUERY_STRING}!</h1></body></html>\n” ;

When I point the browser at the form, enter my name, and submit the form, the page returned to the browser contains the message: Hello who=Bryan!


POST example Change the form as follows:

<form method=post

action=“http://sirah.csit.fsu.edu/cgi-bin/users/dbc/postEg.pl”>

Name: <input type=text name=who size=32> <p>

<input type=submit></form>

and define postEg.pl by:#!/usr/bin/perlprint “Content-type: text/html\n\n” ;

for($i = 0 ; $i < $ENV{CONTENT_LENGTH} ; $i++) { $in .= getc ;}

print “<html><body><h1>Hello $i!</h1></body></html>\n” ;


Using the CGI module

The previous example illustrate the underlying mechanisms used to communicate between server and CGI program.

One could go on to use the text processing features of Perl to parse the form data and generate meaningful responses.

In modern Perl you can (and presumably should) use the CGI module to hide many of these details—especially extracting form parameter.


CGI module example Change the form as follows:

<form method=post

action=“http://sirah.csit.fsu.edu/cgi-bin/users/dbc/CGIEg.pl”>


and define CGIEg.pl by:#!/usr/bin/perl

use CGI qw( :standard) ;

$name = param(“who”) ;print “Content-type: text/html\n\n” ;

print “<html><body><h1>Hello $name!</h1></body></html>\n” ;

Now the browser gets a more friendly message like: Hello Bryan!


Getting Started with Servlets


Server Software

Standard Web servers typically need some additional software to allow them to run servlets. Options include:– Apache Tomcat The official reference implementation for the servlet

2.2 and JSP 1.1 specifications. It can stand alone or be integrated into the Apache Web server.

– JavaServer Web Development Kit (JSWDK) A small standalone Web server mainly intended for

servlet development.– Sun’s Java Web server An early server supporting servlets. Now apparently

obsolete.– Allaire JRun, New Atlanta’s ServletExec, . . .


Tomcat

In these lectures we will use Apache Tomcat for examples.

For debugging of servlets it seems to be necessary to use a stand-alone server, dedicated to the application you are developing.– The current architecture of servlets makes revision of

servlet classes already loaded in a Web server either disruptive or expensive. In general you need to establish your classes are working smoothly before they are deployed in a production server.

Hence you will be encouraged to install your own private server for developing Web applications.

Tomcat is the flagship product of the Jakarta project, which produces server software based on Java.


Typical Modes of Operation of Tomcat

Browser

Apache

ClientServerTomcat

ServletRequest

80

8007

Browser

ClientTomcatServlet

Request

80

Apache

Server

Browser

Client

TomcatServletRequest

8080

Server

1. Stand-alone

2. In-process servlet container

3. Out-of- process servlet container


Downloading Tomcat

Go to the Jakarta home-page: http://jakarta.apache.org

Follow the link for downloading binaries. Under the heading Release Builds, follow the

Tomcat X.X link. Get the file jakarta-tomcat-X.X.tar.gz.


Unpacking and Setting the Environment

Unpack the compressed file, e.g.: gunzip -c jakarta-tomcat-X.X.tar.gz | tar xvf -

Set the environment variables TOMCAT_HOME and JAVA_HOME, e.g.:

export TOMCAT_HOME=$HOME/jakarta-tomcat-X.X

export JAVA_HOME=/usr/java/jdk1.Y.Y

Most likely you will also want to add these commands to your .bashrc file.


Servers on Course Hosts: Ground Rules

The system manager would like to be able to keep track of who is running what Web server.– Also we want to avoid overloading the course hosts.

You will each be allocated a port number on one of the three course hosts. Please stick with this port number and host for you main server.– You can run additional servers on random port numbers

for brief experiments, but please not for extended periods.– Of course avoid port numbers allocated to other students!

Your Tomcat home directory should be directly nested in your top-level home directory. – The management reserves the right to read and modify

your server configuration if it seems to be causing problems.


Choosing a Port

Edit the file jakarta-tomcat-X.X/conf/server.xml. Find the Connector element that defines the parameters of the HTTP connection handler. It looks like:

<Connector className=“. . .”> <Parameter name=“handler”

value=“. . . .HttpConnectionHandler”> <Parameter name=“port” value=“8080”></Connector>

If you are using a course host, change the value of the port parameter from its default 8080 to a port number you have been allocated.


Removing the AJP Connector In the file jakarta-tomcat-X.X/conf/server.xml you

will also find a Connector element defining the parameters of an “AJP connection handler” (used for interactions with an Apache server). It looks like:<Connector className=“. . .”> <Parameter name=“handler”

value=“. . . .Ajp12ConnectionHandler”> <Parameter name=“port” value=“8007”></Connector>

If you are using a course host, change the value of the port parameter from its default 8007 to a value unique to you—e.g. the a port number one greater than your HttpConnectionHandler port.

Even if you are not going to use the Apache connection, the shutdown.sh script also uses this port, so the connection handler is still required.


Starting and Stopping your Server

If you are using a course host, these operations should be done on the host on which you have been allocated a port to run your main server.

To start your server run the script: jakarta-tomcat-X.X/bin/startup.sh

To stop your server run the script: jakarta-tomcat-X.X/bin/shutdown.sh

If for any reason this fails, simply find the java process and kill it.


Check Your Server is Running

If you are running your server on a course host, and your allocated host/port pair is host/XXXX, point your browser at the URL: http://host.csit.fsu.edu:XXXX

You should see the default Tomcat home page. In the Tomcat 3.1 release, the file for this home

page is at:

jakarta-tomcat-X.X/webapps/ROOT/index.html


First Servlets


Creating a Context Before writing a servlet, you need a place to put it.

Shut down your server, if it is running. In the file jakarta-tomcat-X.X/conf/server.xml, find

the example Context elements. Add a new context element such as:

<Context path=“/dbc” docBase=“webapps/dbc/” debug=“0” reloadable=“true”> </Context>– The path attribute defines a logical path that appears in the

URL.– The docBase attribute defines the physical directory where

HTML and servlets live.– Be careful to put /s in all the right places!– The reloadable flag is supposed to allow servlet classes to

be reloaded into a running server if they have been modified. We set it true because that is the recommended default during development. Note, however, it does not work very reliably!


Creating a Document Directory

This can be created as a subdirectory of jakarta-tomcat-X.X/webapps/

With the server configuration defined above, I create a subdirectory: jakarta-tomcat-X.X/webapps/dbc/

This will be the root directory for my HTML documents.

To check my configuration is working properly, I can put a file index.html in dbc/, restart my server, and point my browser at: http://host.csit.fsu.edu:XXXX/dbc

where host/XXXX is my host/port pair. I should see the contents of the HTML file.


A Directory for Servlet Classes

Now I create the subdirectories: jakarta-tomcat-X.X/webapps/dbc/WEB-

INF/

and

jakarta-tomcat-X.X/webapps/dbc/WEB-INF/classes/

The latter directory is where I put class files and package subdirectories for servlets.

The WEB-INF subdirectory will not be directly visible to browsers as a document directory.


A “Hello World” Servlet

import java.io.* ;import javax.servlet.* ;import javax.servlet.http.* ;

public class HelloWorld extends HttpServlet {

public void doGet(HttpServletRequest request, HttpServletResponse

response) throws IOException,

ServletException { response.setContentType(“text/html”) ;

PrintWriter out = response.getWriter() ;

out.println(“<html><body>”) ; out.println(“<h1>Hello World!</h1>”) ; out.println(“</html></body>”) ; }}


Remarks

This program should be contained in a file HelloWorld.java, which may be placed in the classes/ subdirectory.

HttpServlet is the base class for servlets running in HTTP servers. Although servlets can be written for other kinds of server, in reality servlets are nearly always HttpServlets.

The doGet() method is called in response to an HTTP GET request directed at the servlet.

As the names suggest, the arguments describe the browser’s request and the servlet’s response.

Before writing to the output stream associated with the response, the content type header (at least) must be set.


Setting the Class Path Before compiling servlet code you will have to set the

class path to include some related libraries.– The server apparently also needs the class path to be set at the

time it is started, before it can run servlets. Shut down the server again. Set your class path to

include some necessary jar files, e.g.: export CLASSPATH=$TOMCAT_HOME/lib/servlet.jar:\ $TOMCAT_HOME/lib/jasper.jar:\ $TOMCAT_HOME/lib/jaxp.jar

Again, probably add this command to your .bashrc file.– A back-slash, \, at the end of a line is a line-continuation

character (it “escapes” the EOL). Do not include it if you type the whole command on one line!

– To avoid grief in the future, also make sure now that the working directory is on you class path, e.g: CLASSPATH=$CLASSPATH: .

Restart the server.


Compiling and Deploying the Servlet

This is straightforward: javac HelloWorld

You should now be able to view the servlet. In my case I point my browser at the URL:

http://host.csit.fsu.edu:XXXX/dbc/servlet/HelloWorld

Note that by default the Tomcat server will run with the same privileges as the user who started it.

This means you don’t actually need to make files world readable (because you have privileges to read them).

It also means you have to be careful. If you stick with this default you must never deploy servlets that have the power to damage or compromise your account– e.g. by reading or writing arbitrary files, or executing random

commands!


A Servlet that Reads a Parameter

Define a new servlet class called HelloUser. This is identical to the class HelloWorld, except

that the line: out.println(“<h1>Hello World!</h1>”) ;

is replaced with out.println(“<h1>Hello ” + request.getParameter(“who”) +

“!</h1>”) ;


First Form using a Servlet

In the directory jakarta-tomcat-X.X/webapps/dbc/ I place an HTML file hello.html containing the form element:<form method=get

action=“http://sirah.csit.fsu.edu:8081/dbc/servlet/HelloUser”>


This assumes my host/port pair is sirah/8081. To view this form, I point my browser at the URL:

http://sirah.csit.fsu.edu:8081/dbc/hello.html If I enter my name and submit the form, I get back a page

containing the message: Hello Bryan!


The Servlet Life Cycle


Servlet Classes

Any servlet class implements the interface javax.servlet.Servlet.

This interface defines a few low-level methods, including the low-level request-handling method, service().– Perhaps the only method from Servlet you will use

explicitly is getServletConfig(). All servlets we will be concerned with are

extended from the base class javax.servlet.http.HttpServlet (which implements Servlet).


Servlet Instances By default, (at most) one instance of a given

servlet class will ever be created by a Web server process.

By default, the servlet class is loaded into the Web server’s JVM, and the unique servlet instance is created, the first time any client sends a request to a URL identifying the servlet class.

Subsequent requests to the same URL are all handled by the same servlet class instance.– By default, however, each request is handled in a

different Java thread.

This means that a later request can access results of processing an earlier request through values of instance variables (or class variables).


The init() Method

The init() method: public void init() throws ServletException {. . .}

is quite analogous to the init() method on applets. It is called once when the servlet is created. You

override it to define initialization code for your servlet instance.

As with applets, this is used in preference to defining a non-default constructor, because you are allowed to access initialization parameters inside init() (but not in a constructor).– There is another lower-level init() method:

public void init(ServletConfig config) throws ServletException {. . .}

Don’t override it. Instead, if you need a ServletConfig during initialization, call getServletConfig() in the body of the no-argument init().


The Request Handling Methods

These are where you put the code that handles HTTP requests to URL of the servlet.

The available request-handling methods are doGet() Handle HTTP GET request. doPost() Handle HTTP POST request. doPut() Handle HTTP PUT request. doDelete() Handle HTTP DELETE request. doOptions() Handle HTTP OPTIONS request. doTrace() Handle HTTP TRACE request.

– Note there is no doHead(). These have generic signature:

protected void doXxx(HttpServletRequest req, HttpServletResponse

resp) throws ServletException,

IOException {. . .}


Last Modification Date

When a browser reloads a page, it can include an If_Modified-Since header.

If the document has not been modified since the specified date , the server response will be a simple “Not Modified” status code (no data).

For dynamically generated content, OS date-stamps on document files are not enough to determine whether the effective content will be different.

Instead a servlet can override:protected long getLastModified(HttpServletRequest

req) throws ServletException,

IOException {. . .}

and thus take advantage of browser caching.– The returned date is in standard Java representation—

milliseconds since New Year, 1970.


The destroy() method.

Finally a servlet can also override public void destroy() {. . .}

If the Web server terminates gracefully, it will invoke destroy() on all servlet instances it holds before shutting down.

In principle, this is a place where you can put code to back-up the current state of the servlet to persistent storage. The servlet can restart from restored state when the Web server is restarted.

In practice, servers (especially Tomcat!) often terminate “ungracefully”, when the system crashes or the server process is killed. Relying on destroy() methods being called is probably not advisable.


A Counter Servlet import java.io.* ; import javax.servlet.* ; import javax.servlet.http.* ;

public class Counter extends HttpServlet { int count = 0 ;

public void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException, ServletException {

resp.setContentType(“text/html”) ;

PrintWriter out = resp.getWriter() ;

out.println(“<html><head></head><body>”) ; out.println(“This servlet instance has been

accessed ” + (count++) + “ times”) ; out.println(“ </body></html>”) ; } }

Example taken from “Java Servlet Programming”, O’Reilly.


Remarks

The first time I point my browser at this servlet (e.g. at

http://sirah.csit.fsu.edu:8081/dbc/servlet/Counter

), I get a response page containing the message: This servlet has been accessed 0 times

Each time I reload the URL, the count increases. Since count is an instance variable of the class, this

illustrates that indeed only a single instance of Counter is created.

This servlet is not completely reliable, because it is possible to have concurrent requests in different threads. The instance variable count is shared by threads. This could lead to problems of interference.


Mutual Exclusion In general, any access to

– servlet instance variables,– servlet class variables,– external files, etc

that may be modified by any HTTP request on the servlet, should be guarded by synchronized methods or a synchronized statement. This is very important!

For example, the increment of count could be done in a synchronized statement: int myCount ; synchronized(this) myCount = count++ ;

Subsequently the local variable myCount—which is private to the thread—is printed in the response.


Registering Servlet Instances

In simple cases we don’t need to explicitly register servlets with the Web server. Instances will simply be created on demand.

However, registering servlets has various advantages:– we can give the servlets meaningful names, or map

them to simpler URL addresses,– we can create multiple instances of the same servlet

class, with different names,– we can set initialization parameters for the instance,

etc. With Tomcat, servlets can be registered by

creating entries in an XML file called web.xml, which is placed in the WEB-INF/ subdirectory for your context.


Example Registering a Servlet

I copy the example file:

jakarta-tomcat-X.X/webapps/examples/WEB-INF/web.xml

to my personal context directory: jakarta-tomcat-X.X/webapps/dbc/WEB-INF/

I delete the existing <servlet>. . .</servlet> and <servlet-mapping>. . .</servlet_mapping> elements from my copy, and replace them with:

<servlet> <servlet-name>counter1</servlet-name> <servlet-class>Counter</servlet-class> </servlet>

I restart the server.


Multiple Instances

I can view the registered servlet at the URL:

http://sirah.csit.fsu.edu:8081/dbc/servlet/counter1 Now add a second servlet element to the web.xml

file:

<servlet> <servlet-name>counter2</servlet-name> <servlet-class>Counter</servlet-class> </servlet>

After restarting the server again, I find that the access count for the original servlet and the second servlet at:

http://sirah.csit.fsu.edu:8081/dbc/servlet/counter2

are updated independently.


Initialization Parameters

A new counter servlet, defining an init() method:public class InitCounter extends HttpServlet { int count ;

public void init() throws ServletException { ServletConfig config = getServletConfig() ; try { count =

Integer.parseInt(config.getInitParameter(“initial”)) ; } catch (NumberFormatException e) { count = 0 ; } } public void doGet(HttpServletRequest req, HttpServletResponse resp)

throws . . . { . . . }}


Defining Initialization Parameters

In web.xml, I add the element: <servlet> <servlet-name>counter1</servlet-name> <servlet-class>Counter</servlet-class>

<init-param> <param-name>initial</param-name> <param-value>50</param-value> </init-param> </servlet>

Now when I restart the server and point my browser at, say: http://sirah.csit.fsu.edu:8081/dbc/servlet/counter

I get a response page containing the message: This servlet has been accessed 50 times


Handling Requests


Reading Form Data

Servlets make reading form data easy (at least in common cases).

If a particular parameter name is known to have only a single value, one can just apply the method: public String getParameter(String name)

{. . .}

to the HttpServletRequest parameter of the doGet() or doPost() method.

Use of this method was illustrated earlier in the HelloUser example. Note parameter names are case sensitive.


Uniform support for GET and POST

The parameter-reading methods behave the same for GET and POST requests.

It is natural to support both kinds of request with the same code.

To do this, simply have doGet() dispatch to doPost(), or vice versa. For example: public void doGet(HttpServletRequest request, HttpServletResponse

response) throws IOException,

ServletException { doPost(request, response) ; }


Determining the HTTP Method

The getMethod() method on the HttpServletRequest returns the HTTP method appearing in the header.

For example, when a client sends the HTTP HEAD request, the server is supposed to treat it like a GET request, but return the headers only—not the data.

The server will automatically discard any data doGet() returns, but (if you had the urge) you could make things a bit more efficient as follows: public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException,

ServletException { . . . set headers . . . if(request.getMethod().equals(“HEAD”)) return ; . . . return data . . . }


Information from Request Headers

getMethod() is one a series of convenience methods that read information from the request headers.

Others include:getRequestURI(), getProtocol() Method headergetContentLength() Content-Length

headergetContentType() Content-Type

headergetAuthType(), getRemoteUser() Authorization

headergetCookies() See later


Reading Request Headers Directly

Preceding methods are not exhaustive. If you know the name of the header you want,

use String getHeader(String name)

For headers (e.g. Accept-Language) that can appear multiple time in a given request, use: java.util.Enumeration getHeaders(String name)

To simply enumerate all headers of a given request, use java.util.Enumeration getHeaderNames()

in conjunction with getHeader().


Displaying All Headers

public class Headers extends HttpServlet { public void doPost(HttpServletRequest req, HttpServletResponse resp) { resp.setContentType(“text/html”) ; PrintWriter out = resp.getWriter() ; out.println(“<html><head></head><body>”) ; Enumeration headers req.getHeaderNames() ; while(headers.hasMoreElements()) { String name = (String) headers.nextElement() ; out.println(name + “<br>” + req.getHeader(name) +

“<br><br>”) ; } out.println(“</body></html>”) ; }}


HTTP 1.1 Request Headers

Accept: MIME types the browser can handleAccept-Charset: Character sets the browser can

handleAccept-Encoding: Encoding (e.g gzip)Accept-Language: English (en), etc.Authorization: User ID/passwordCache-Control: For proxy servers.Connection: Can the browser keep connections alive?Content-Length: of POSTed dataContent-Type: MIME encodingCookie: Cookies previously received from this site.Expect: Browser wishes to attach a documentFrom: email address of requester.Host: host/port information on original URL


HTTP 1.1 Request Headers (cont.)

If-Match:If-Modified-Since: only send recently changed

data.If-Match:If-None-Match:If-Range:If-Unmodified-Since: Used with PUT.Pragma: onlystandard value is no-cache.Proxy-Authorization:Range: Get part of document.Referer: Set if was link from a Web pageUpgrade: Change protocolUser-Agent: Identifies browser Via: Set by gateways and proxiesWarning:


Multiple-valued Parameters

If a form parameter can have more than one value (e.g. a value from a menu allowing multiple selections), you should apply the method: public String [] getParamterValues(String name)

{. . .}

to the HttpServletRequest object. Recall this example from the section on forms:

<select name=pets size=3 multiple> <option value=dog> Dog <option value=cat> Cat <option value=bird> Bird <option value=fish> Fish </select>

The form may send the data in a GET request to the following servlet.


Handling Multi-valued Parameterspublic class MultiValue extends HttpServlet {

public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException { response.setContentType(“text/html”) ;

PrintWriter out = response.getWriter() ;

String [] pets = request.getParameterValues(“pets”) ;

out.println(“<html><body><head></head>”) ; out.println(“Your pets:<p>”) ; out.println(“<table border cellspacing=0

cellpadding=5>”) ;

for (int i = 0 ; i < pets.length ; i++) out.println(“<tr><td>” + pets [i] + “</td></tr>”)

; out.println(“</table>”) ; out.println(“</html></body>”) ; }}


Multi-part Data

Recall this (slightly modified) example from the section on forms:

<form method=post enctype=“multipart/form-data”

action=“http://sirah.csit.fsu.edu:8081/dbc/servlets/MultiPart”>

Course: <input name=course size=20> <p> Students file: <input type=file name=students


The simple getParam() approach does not appear to work for multi-part data (required for uploading files).

However, we can resort to a lower-level CGI-like approach—reading the posted data from an input stream, and decoding it “by hand”.


Displaying Raw Multi-part Datapublic class MultiPart extends HttpServlet { public void doPost(HttpServletRequest req, HttpServletResponse resp) { resp.setContentType(“text/html”) ; PrintWriter out = resp.getWriter() ; out.println(“<html><head></head><body>”) ; String contentType = req.getContentType() ; out.println(“content type:<br>” + contentType +

“<br>”) ; BufferedReader in = new

BufferedReader(req.getReader()) ; while(true) { String line = in.readLine() ; if(line == null) break ; out.println(line + “<br>”) ; } out.println(“</body></html>”) ; }}


Remarks

This servlet will simply print value of the Content-Type header, and the raw version of the posted data.

In general it is not safe to combine this style of reading data, using getReader(), with the higher-level approach, using getParameter()—choose one or the other.


Multi-part Data Examplepublic void doPost(HttpServletRequest req, HttpServletResponse resp) { resp.setContentType(“text/html”) ; PrintWriter out = resp.getWriter() ; out.println(“<html><head></head><body>”) ; Vector students = new Vector() ; String course = parseFormData(students, req) ; out.println(“course: ” + course + “<br>”) ; out.println(“students: <br>”) ; out.println(“<table border cellspacing=0

cellpadding=5>”) ; for (int i = 0 ; i < pets.length ; i++) out.println(“<tr><td>” + (String) students.get(i) +

“</td></tr>”) ; out.println(“</table>”) ; out.println(“</body></html>”) ; }}


Multi-part Data Example (cont.)public String parseFormData(Vector students, HttpServletRequest

req) { String contentType = req.getContentType() ; String boundary = “--” + contentType.substring( . . . ) ; //Extract part boundary from content type header BufferedReader in = new BufferedReader(req.getReader()) ; String line = in.readLine() ; while(! line.equals(boundary + “--”)) ) { String header = in.readLine() ; String name = header.substring( . . . ) ; //Extract parameter name from content disposition

header if(name.equals(“course”)) { course = in.readLine() ; line = in.readLine() ; } else if(name.equals(“students”)) while(true) { line = in.readLine() ; if(line.startsWith(boundary)) break ; students.addElement(line) ; } } return course ; }}


Remarks

The parseFormData() implementation outlined here is schematic only.

Parsing the multi-part MIME encoded data is straightforward, but clearly fairly tedious.– Servlets don’t give much help here.


Generating Responses


The HTTP Status Line A minimal server response to a client request

might be: HTTP/1.1 200 OK Content-Type: text/plain

Hello World! We already saw how to set the content type

explicitly using setContentType(). Here we are more interested in the first header

line: the status line. As the example suggests, a status value of 200

means the request was successfully serviced. For a servlet response, the Web server sets this status value by default.

A servlet can explicitly set other values by using the setStatus() method of HttpResponse.


HTTP Status Codes100 Continue: Response to Expect request.100 Switching Protocols: Response to Upgrade request.200 OK: OK!201 Created: Server created a document. URL follows.202 Accepted: Processing is in progress.203 Non-Authoritative Information:204 No Content: No new document is available.205 Reset Content: Clear form fields.206 Partial Content: Response to Range request.300 Multiple Choices: Trick question?301 Moved Permanently: Document is elsewhere302 Found: Redirects the browser to a different URL.303 See Other: Please use GET instead of POST.304 Not Modified: Response to request with If-Modified-

Since.305 Use Proxy: Go to proxy at returned URL307 Temporary Redirect: like 302.


HTTP Status Codes (cont.)400 Bad Request: Syntax error.401 Unauthorized: No appropriate Authorization header403 Forbidden: Not allowed with any authorization404 Not Found: Not at this address.405 Method Not Allowed: Self explanatory.406 Not Acceptable: Resource doesn’t match Accept header.407 Proxy Authentication Required:408 Request Timeout: Client took too long sending request.409 Conflict: Used with PUT.410 Gone: Document has gone.411 Length Required: Content-Length missing (in POST).412 Precondition Failed:413 Request Entity Too Large: Document too big to handle.414 Request URI Too Long: URI is too long415 Unsupported Media Type:416 Requested Range Not Satisfiable:417 Expectation Failed: Disillusioned?


HTTP Status Codes (cont.)

500 Internal Server Error: Server is confused.

501 Not Implemented: Requested functionality not supported.

502 Bad Gateway: Used by proxy servers.

503 Service Unavailable: Server overloaded or service down.

504 Gateway Timeout: Used by proxy servers.

505 HTTP Version Not Supported: Self explanatory.


Explicitly Returning Status Codes These status values are available as predefined

constants in the HttpServletResponse class: final int SC_OK = 200 ; final int SC_FOUND = 302 ; final int SC_NOT_FOUND = 404 ;

etc. The default status is equivalent to explicitly doing: resp.setStatus(HttpServletResponse.SC_OK) ;

There are a couple of convenience methods on HttpServletResponse for dealing with common cases:void sendError(int sc, String message)– send specified status, with generated page containing message.void sendRedirect(String location)– send SC_TEMPORARY_REDIRECT status, and include

Location header.


Redirecting the Browser

By sending the SC_FOUND or SC_TEMPORARY_REDIRECT status, together with a dynamically generated URL, a servlet can cause a the browser to go directly to a different page or site (without the user manually clicking another link).

Following is a simplified version of an example from “Core Servlets and Java Server Pages”.

It allows the user to specify a search string and a preferred search engine, dynamically generates a query URL for the chosen search engine, and redirects the browser to that URL.


Search-Engine Selection Servlet

public class Search extends HttpServlet { public void doGet(HttpServletRequest req, HttpServletResponse resp) { String searchEngine = req.getParameter(“searchEngine”)

; String searchString =

req.getParameter(“searchString”) ; String url = null ; if(searchEngine.equals(“google”) url = “http://www.google.com/search?q=” +

searchString ; if(searchEngine.equals(“lycos”) url = “http://lycospro.lycos.com/cgi-bin/pursuit?

query=” +

searchString ; if(searchEngine.equals(“hotbot”) url = “http://www.hotbot.com/?MT=” + searchString ; resp.sendRedirect(url) ; }}


Remarks The sendRedirect() call does everything necessary to

create the response. To deal with complex search strings, you should probably

URL-encode searchString before appending it to url. A possible form:

<form method=get

action=“http://sirah.csit.fsu.edu:8081/dbc/servlets/Search”> Search engine: <p> Google: <input type=radio name=searchEngine

value=google checked> Lycos: <input type=radio name=searchEngine value=lycos> Hotbot: <input type=radio name=searchEngine

value=hotbot> <p> Search string: <input type=text name=searchString

size=40> <input type=submit></form>


Introduction to Session Tracking


The Problem

HTTP is a stateless protocol—it provides no intrinsic way to associate one request/response transaction with any subsequent transactions.

But very often a Web application requires that the server engage in a non-trivial dialog with a single user, involving multiple client requests and server responses.

So the problem is to find ways to define and keep track of a particular “session” between browser and Web server.

This is called session tracking.


Solutions There are three solutions in common use:

– Hidden Form Fields Assumes all client requests associated with the session

are form submissions. The forms must be dynamically generated by the server, and include hidden input fields that preserve session information.

– URL-Rewriting Again assumes all pages associated with the session are

dynamically generated by the server. Session information is directly appended to any URLs referring back to the server in the generated pages.

– Cookies An extension to HTTP allows a server to ask a browser to

store small amount of persistent information. The browser returns this information in HTTP request headers, typically whenever the client revisits a Web server on the same host.


Example Using Hidden Form Fields

The classic example of “session information” is the contents of a customer’s shopping cart at an online store.

In the interests of fitting code in slides, we scale this down and deal with selections from a virtual snack-vending machine.– “. . . Think of clocks and counters and telephones and

board games and vending machines.” C.A.R Hoare, Communicating Sequential Processes, 1985.


Snack-Vending Machinepublic class VendingMachine extends HttpServlet { String[] snacks = {“Chips”, “Popcorn”, “Peanuts, . . . } ; public void doGet(HttpServletRequest req, HttpServletResponse resp) throws . . .

{ resp.setContentType(“text/html”) ; PrintWriter out = resp.getWriter() ; String [] selections =

req.getParameterValues(“selection”) ; out.println(“<html><head></head><body>”) ; for(int i = 0 ; i < snacks.length ; i++) { out.println(“<form action=” + selectURL + “>”) ; out.println(“<input type=submit name=selection

” + “value=\“” + snacks [i] + “\”>”) ; printHidden(out, selections) ; // print hidden fields out.println(“</form>”) ; } . . . generate form element for viewing current selections . . . out.println(“</body></html>”) ; }}


Remarks The servlet generates an HTML page with one form

element for every snack. – selectURL is a reference back to this servlet.

The submit button for each form sets a value for the parameter called selection: value set is name of the snack.

Crucially, every form element in the generated page also sets again any pre-existing values for selection, using hidden input elements:

void printHidden(PrintWriter out, String [] selections) { if(selections != null) for(int j = 0 ; j < selections.length ; j++) out.println(“<input type=hidden

name=selection ” + “value=\“” + selections [j] +

“\”>”) ; }

The value of selections was returned by the the call to getParameterValues(), earlier in the servlet method.


The Initial Page If I go to the URL of the servlet, perhaps

http://sirah.csit.fsu.edu:8081/dbc/servlets/VendingMachine

I see something like:


Generated Source of Initial Page

If we view HTML source of the initial page, it includes a series of forms:<html><head></head><body>

<form action=http://sirah...:8081/dbc/servlet/VendingMachine><input type=submit name=selection value="Chips"></form>

<form action=http://sirah...:8081/dbc/servlet/VendingMachine><input type=submit name=selection value="Popcorn"></form>

<form action=http://sirah...:8081/dbc/servlet/VendingMachine><input type=submit name=selection value="Peanuts"></form>

...

</body></html>

selections was null, and initially there are no hidden fields.


Making Selections

If I click on a couple of the selections on the initial page, apparently nothing changes—each selection returns a generated page that looks identical in the browser.

But if I view the generated HTML source. . .


Generated Source of Later Pages

<html><head></head><body>

<form action=http://sirah...:8081/dbc/servlet/VendingMachine><input type=submit name=selection value="Chips"><input type=hidden name=selection value=”Peanuts"><input type=hidden name=selection value=”Chips"></form>

<form action=http://sirah...:8081/dbc/servlet/VendingMachine><input type=submit name=selection value="Popcorn"><input type=hidden name=selection value=”Peanuts"><input type=hidden name=selection value=”Chips"></form>

...

</body></html>

Every form now contains hidden fields holding values that were in selections.


Handling the Accumulated “State”

The page returned by the VendingMachine servlet contains a final form generated by:

out.println(“<form action=” + viewURL + “>”) ;

out.println(“View current selections: <input type=submit>”) ; printHidden(out, selections) ; out.println(“</form>”) ;

Here viewURL is a reference to a second servlet, which generates a page containing the contents of the hidden fields.


Critique of Hidden Fields

The approach is quite elegant, but it has some problems:– All interactions between client and server must go through

forms.

– Every form on every generated page must include the hidden fields defining the session state.

– In our example, the number of hidden fields grew quickly.

All approaches to session tracking run into problems analogous to the last: one wishes to keep down the amount of hidden information that must be exchanged in every single transaction of a session.

For example, this will be important for the URL-rewriting approach, because we don’t want to end up with huge URLs.


Session IDs The direct “hidden fields” approach does not

store session state in any fixed place. The “state” is somehow encoded by the current point in an ongoing dialog.– Perhaps reminiscent of simulation of state by lazy lists

in functional programming languages??

This is interesting, but, as noted, it means that the associated information is constantly swapped between client and server.

An obvious solution is for the server to store the bulk of the data associated with each active session.

The only session information bounced back and forth between client and server is an immutable identifier for the session.


Improved Vending Machine Servlet In an improved version of our vending machine

servlet, the main servlet has a static variable, sessionTable.– We make it static so it can be accessed by a separate servlet

class, used for viewing or processing the current selections. This sessionTable is a HashMap. It is keyed by a

session ID string. The associated values are “records” describing the current state of the session.

In our simple example, each session-state “record” is a Vector containing the items selected thus far.

In our example, the session ID is a random number generated when the servlet is initially called (without a sessionID parameter).

This number is embedded as a hidden field in the generated pages, and thus returned in subsequent transactions.


A Second Vending Machinestatic HashMap sessionTable = new HashMap() ;Random rand = new Random() ; // Seeded by current date/time

public void doGet(HttpServletRequest req, HttpServletResponse resp) throws . . . { . . . String sessionID = req.getParameter(“sessionID”) ; if(sessionID == null) { // First invocation in this session sessionID = “” + rand.nextInt() ; sessionTable.put(sessionID, new Vector()) ; } else { // Subsequent invocation Vector selections = (Vector)

sessionTable.get(sessionID) ; String selection = req.getParameter(“selection”) ; if(selection != null) selections.addElement(selection) ; } . . . Print single hidden field in all forms: out.println(“<input type=hidden name=sessionID ” + “value=” + sessionID + “>”) ; . . .}


Remarks

Our naive implementation does not worry about issues of thread safety.

More strictly, accesses to sessionTable should be synchronized, eg:

synchronized(sessionTable) sessionTable.put(sessionID, selections) ;

This is sufficiently safe if we make the often-reasonable assumption that there are no concurrently active transactions involving the same session.– Without this assumption, access to the individual session

records should be synchronized as well.

The selection-viewing servlet can access the session table in the first servlet class by VendingMachine2.sessionTable.


Server Restarts

Our simplified implementation will fail ungraciously if the server is restarted while a browser is in the middle of a session.

The session record disappears, while the session ID may still be stored in the browser.

Unless session data is stored persistently there is no completely satisfactory solution, but a servlet writer should be aware of this possibility, and code defensively (perhaps sending an explanatory message to the browser).


URL-Rewriting

URL-rewriting can be regarded as an optimization of the hidden fields approach.

Assuming a form with a hidden field is submitted using the GET method, what the server really sees is just a request whose URI has been extended with an encoding of the value in the hidden field.

In URL-rewriting we cut out role of the browser (encoding session data from hidden fields) and directly extend the URL in the action attribute of the form with an encoding of the session data.

As a byproduct, this also works for URLs in anchor elements (simple hypertext links).


A Third Vending Machinepublic void doGet(HttpServletRequest req, HttpServletResponse resp) throws . . . { . . . String sessionID ; String pathInfo = req.getPathInfo() ; if(pathInfo == null) { // First invocation in this

session sessionID = “” + rand.nextInt() ; sessionTable.put(sessionID, new Vector()) ; } else { // Subsequent invocation sessionID = pathInfo.substring(1) ; // Strip leading “/” Vector selections = (Vector) sessionTable.get(sessionID) ; String selection = req.getParameter(“selection”) ; if(selection != null) selections.addElement(selection) ; } . . . out.println(“<form action=” + selectURL + “/” + sessionID +

“>”) ; out.println(“<input type=submit name=selection . . . >”) ; out.println(“</form>”) ; . . .}


Remarks

The session ID information is appended to the servlet URL in the action attribute of the forms (recall selectURL is the URL of this servlet).

On any invocation, after the first in the session, this information can be retrieved using getPathInfo().

The getPathInfo() method on HttpServletRequest returns any text in the request URL following the servlet name (up to and excluding the ? that delimits the query string, if there is one).

We now have the option to replace the form that connects to a selection-viewing servlet with a simple anchor element. – The URL in the anchor element is extended with the session

ID information, just like the action attribute in a form.


Cookies


Cookies

A cookie is a small piece of contextual information embedded in an HTTP response from a Web server.

If a browser receives an HTTP response including a Set-Cookie header (and it is willing to accept cookies) it stores this information.

The information can either be stored in the memory of the running browser program (“session cookies”) or saved to disk (“persistent cookies”).

Subsequently, whenever the browser constructs an HTTP request for a server, it checks if it is storing any cookies for the server involved.

If so, it returns the cookie information to the server, in a Cookie header in the new request.


Uses of Cookies

Recognizing a regular customer– A persistent cookie can save some identification

information for the particular customer. The stored information may be actual name and details, or (preferably) some key into a database on the server.

– When the customer returns to the site, associated information (mailing address, etc) is already known; it doesn’t have to be entered anew by the customer.

– There are many variations on this theme, e.g. it allows portal sites to do focussed advertising.

Session Tracking– Within the context of a single “visit” to a site, cookies

can be used as an alternative to hidden fields or URL-rewriting, as the underlying mechanism for session tracking.


Abuses of Cookies A poorly constructed commercial site might use

cookies to store sensitive information (e.g. credit card numbers) on the hard disk of your PC. This might be a privacy problem if the PC is shared by several users.

A Web site can persuade a browser to send a cookie to a third party site, by embedding an image that comes from the Web server of the third party.– The third party site might offer the original site collated

information on its visitors.– It may be a particular nuisance if the third party has

previously harvested the email address of the user, e.g. by sending them an HTML email containing a cookie-setting icon.

– Moral: configure your browser to only send cookies to the actual page you are visiting?


Limits to Cookies

Typically a browser will restrict the number and size of cookies it will accept, e.g.:– Maximum of 20 cookies per site,– Maximum of 300 cookies total (from all sites),– Maximum size of individual cookie is 4 kilobytes.

Users may of course configure their browsers to refuse all cookies, or only accept selected cookies.

Hence a Web application should not rely on cookies for basic functionality—only for “added value”.


The Servlet Cookie API

The servlet creates a cookie by using a constructor for the class Cookie.

Various attributes can be set for the cookie before sending it to the client. They include:– The name and value of cookie. These are usually set in the Cookie constructor.– The domain to which the cookie should be returned. By default the cookie will only be returned to the

server that sent it, but this default can be overridden.– The URI path to which the cookie should be returned. By default, the cookie is only returned to pages in the

same directory as the page that sent the cookie.– The time when a persistent cookie expires e.g., the cookie should be deleted by the browser after

one hour, after one year, etc.


A Servlet that Sets Two Cookiespublic class SetCookies extends HttpServlet { Random rand = new Random() ; // Seeded by current

date/time

public void doGet(HttpServletRequest req, HttpServletResponse resp) throws . . .

{ resp.setContentType(“text/html”) ;

Cookie session = new Cookie(“mySessionCookie”, “” + rand.nextInt()) ; resp.addCookie(session) ;

Cookie persistent = new Cookie(“myPersistentCookie”, “” +

rand.nextInt()) ; persistent.setMaxAge(3600) ; // One hour resp.addCookie(persistent) ;

PrintWriter out = resp.getWriter() ; out.println(“<html><head></head><body>”) ; out.println(“<h1>Enjoy your cookies!</h1>”) ; out.println(“</body></html>”) ; }}


Remarks

The arguments of the Cookie constructor are the cookie name and value.

Here the value is a random number. Cookie names or values should not include white

space or any of: “[”, “]”, “(”, “)”, “=”, “,”, “””, “/”, “?”, “@”, “:”, “;”.

To make a cookie persistent, set the expiration time in seconds using setMaxAge().

By default the expiration time is negative, indicating a session cookie.


Viewing the Set-Cookie Headers

After deploying this servlet, we can view the headers it returns by modifying the TrivialBrowser class (introduced in the network programming lecture) to take host, port, and path arguments. . .


HTTP Response Including Set-Cookie

java TrivialBrowser sirah.csit.fsu.edu 8081 /dbc/servlet/SetCookies

HTTP/1.0 200 OKDate: Mon, 13 Nov 2000 15:49:25 GMTServlet-Engine: Tomcat Web Server/3.1 (JSP 1.1;

Servlet 2.2; Java 1.2.2; Linux 2.2.14-5.0 i386; java.vendor=Sun Microsystems Inc.)

Set-Cookie: mySessionCookie=1367792973Set-Cookie: myPersistentCookie=1264283064;Expires=Mon,

13-Nov-2000 16:49:25 GMTContent-Language: enContent-Type: text/htmlStatus: 200

<html><head></head><body><h1>Enjoy the cookies!</h1></body></html>


Browser Behavior

If we visit the SetCookies servlet with a real browser, we just see a message: Enjoy the cookies!

Now if we point the browser at our earlier Headers servlet (extended to accept GET requests), we may see something like:

User-Agent:Mozilla/4.51 [en] (X11; I; SunOS 5.7 sun4u)...

Cookie:mySessionCookie=1367792973; myPersistentCookie=1264283064...

The browser is returning the cookies in a Cookie header.


Retrieving Cookies with the Cookie API

The previous example just used the generic getHeader() method to view the HTTP Cookie header returned by the browser.

Of course the cookie API provides higher level methods to do this.


Displaying Cookiespublic class ShowCookies extends HttpServlet {

public void doGet(HttpServletRequest req, HttpServletResponse resp) throws . . .

{ resp.setContentType(“text/html”) ; Cookies [] cookies = req.getCookies() ; PrintWriter out = resp.getWriter() ; out.println(“<html><body><head></head>”) ; out.println(“You returned cookies:<p>”) ; out.println(“<table border cellspacing=0

cellpadding=5>”) ; . . . write a header row . . . for (int i = 0 ; i < cookies.length ; i++) { Cookie cookie = cookies [i] ; out.println(“<tr><td>” + cookie.getName() +

“</td>”) ; out.println(“<td>” + cookie.getValue() +

“</td></tr>”) ; } out.println(“</table>”) ; out.println(“</html></body>”) ; }}


Remarks

The method getCookies() returns an array of Cookie objects.

The methods getName() and getValue() on a Cookie object naturally return name and value of the cookie.

The API has no method for extracting just a cookie with a specified name from the request (i.e. nothing directly analogous to getParameter()).


Visiting ShowCookies

Initially:

After visiting SetCookies:

After restarting the browser:


Session Tracking Using Cookies

Our penultimate version of the vending machine servlet uses cookies instead of URL-rewriting.

We need to define a method to retrieve a cookie with a given name, e.g. String getCookieValue(HttpServletRequest req,

String name) { Cookie [] cookies = req.getCookies() ; for(int i = 0 ; i < cookies.length ; i++) { Cookie cookie = cookies [i] ; if(cookie.getName().equals(name)) return cookie.getValue() ; } return null ; }


A Fourth Vending Machinepublic void doGet(HttpServletRequest req, HttpServletResponse resp) throws . . . { . . . String sessionID= getCookieValue(request, “vending_session”) ; if(sessionID == null) { // First invocation in this session sessionID = “” + rand.nextInt() ; resp.addCookie(new Cookie(“vending_session”, sessionID)) ; sessionTable.put(sessionID, new Vector()) ; } else { // Subsequent invocation Vector selections = (Vector) sessionTable.get(sessionID) ; String selection = req.getParameter(“selection”) ; if(selection != null) selections.addElement(selection) ; } . . . out.println(“<form action=” + selectURL + “>”) ; out.println(“<input type=submit name=selection . . . >”) ; out.println(“</form>”) ; . . .}


Remarks

We no longer have to worry about rewriting forms and anchor elements.

A “session” can now be tracked across links through intervening static HTML pages.

However:– This version provides no way to terminate a session,

short of restarting the browser.– There are subtle questions about how the “scope” of a

session is delimited.– A functional programmer might argue that we lost

“referential transparency”??


The Servlet Session Tracking API


The Session Tracking API

We have already illustrated several underlying approaches to session tracking:– Hidden fields, URL-rewriting, cookies.

In general an application has to make a choice between these mechanisms, taking into account support in server and browser.

Session cookies may be favored on grounds of generality and flexibility, but not all clients will accept them.

In practice the servlet programmer does not have to worry too much about these issues. A high-level API is provided that will transparently choose and deploy a suitable low-level tracking mechanism.


The HttpSession class

A particular session is represented by an object from the HttpSession class.

A session is defined as an association, lasting for some period, between a particular browser and a particular group of servlets on a server.

The current session is obtained by applying the method getSession() to the HttpRequest.

If no session object currently exists for this browser/servlet association, one will be created on the first call to getSession().


Simple Example

public class GetSession extends HttpServlet {

static final String myURL = . . . URL of this servlet . . . ;

public void doGet(HttpServletRequest req, HttpServletResponse resp)

throws . . . {

HttpSession session = req.getSession(true) ;

resp.setContentType(“text/html”) ;

PrintWriter out = resp.getWriter() ; out.println(“<html><body><head></head>”) ; out.println(“<a href=” + resp.encodeURL(myURL)

+ “>” + “View servlet again</a>”) ; out.println(“</html></body>”) ; }}


Remarks

The true argument of getSession() means that a new session object will be created if one does not already exist.– For robustness you should probably always use this

argument. (The documentation says that the form of getSession() without an argument is equivalent. Experience suggests maybe not?)

This servlet simply outputs a link back to itself (it doesn’t explicitly use the session object).

One important thing to note is the call to encodeURL().

This method should be applied to any URLs in the generated page that refer back to the same servlet context.

This supports URL-rewriting (if this is the session-tracking strategy adopted for the session).


Viewing the Generated Page

Pointing the browser at this page, we see a page containing a link: View servlet again

If we view the HTML source of this generated page, we may see something like:<html><head></head><body><a href=http://sirah.csit.fsu.edu:8081/dbc/servlet/

GetSession;jsessionid=To1019mC0 . . . 365At > View servlet again </a></body></html>

The URL in this first generated page has been rewritten to include an attribute jsessionid. The associated value is a long, random-looking string.


Checking the Cookies

Before doing anything else, we visit the ShowCookies servlet.

We may see something like:

In the first HTTP response after the session is created, the servlet both rewrites URLs, and sends a cookie to the browser.

The same session ID appears in both places.


Revisiting the Servlet

Going back to the GetSession servlet, if I follow the link to view the servlet again, I see a page that looks the same.

But if I view the HTML source again I may see:<html><head></head><body><a

href=http://sirah.csit.fsu.edu:8081/dbc/servlet/GetSession>

View servlet again </a></body></html>

This time the URL is not rewritten.


Selecting a Session Tracking Mechanism

As noted, in the first response after a session is created, the servlet both sends a cookie and rewrites URLs

If the browser returns the session ID cookie in a subsequent request, URL-rewriting can be disabled.

If the browser is not returning cookies, URL-rewriting will continue.

All this happens “behind the scenes”: the servlet programmer may not even be aware of the mechanism.


Binding Information to a Session

Of course this is not particularly useful unless we have a way to associate application information with the session.

In previous examples we used a HashMap, keyed by session ID, to store session data.

We may assume that analogous mechanisms are used behind the scenes in the session-tracking API, but the session ID is not usually directly accessed by the programmer.

Instead, the application programmer just sees the HttpSession object. Methods are available to directly “cache” information in this object.– The session object itself behaves like a simple

collection class.


Some Methods on HttpSession

public void setAttribute(String name, Object value) Add a reference to the object value to the session

object, keyed by the string name.

public void removeAttribute(String name) Remove the value associated with the key name from

the session.

public Object getAttribute(String name) Extract the value associated with the key name.

Note the value object may implement HttpSessionBindingListener, in which case it will be notified when it is added or removed from a session.


Session Attributes vs. Instance Variables

In well-written Java programs, local variables are normally declared inside methods to hold values that are computed and used by only a single method invocation.

Typically, instance variables are used to hold values that need to be shared across multiple invocations.

In servlet programming—where several sessions may be concurrently operating on the single servlet instance—this role for instance variables is naturally taken over by attributes of the session object.

Think hard before declaring an instance variable in a servlet. In many cases you should probably be using a session attribute instead.


A Final (?) Vending Machine

The first operation in the doGet() method is to retrieve or create a session object using getSession()

We then attempt to extract a Vector object called selections from the session.

If we fail, we can assume this is the first transaction in this session. A new Vector object is created, and added to the new session.

Form parameters are added to the Vector as usual.

Whenever URLs referring back to this servlet context appear in the generated HTML, they are passed through encodeURL().


A Fifth Vending Machinepublic void doGet(HttpServletRequest req, HttpServletResponse resp) throws . . . { HttpSession session = request.getSession(true) ; Vector selections = (Vector)

session.getAttribute(“selections”) ; if(selections == null) { // First invocation in this

session selections = new Vector() ; session.setAttribute(“selections”, selections) ; }

String selection = req.getParameter(“selection”) ; if(selection != null) selections.addElement(selection) ;

. . . out.println(“<form action=” +

resp.encodeURL(selectURL) + “>”) ; out.println(“<input type=submit name=selection . . . >”)

; out.println(“</form>”) ; . . .}


Remarks It is still recommended to use synchronized blocks to

ensure thread safety. You can use the session object for synchronization, e.g.:synchronized (session) { Vector selections = (Vector)

session.getAttribute(“selections”) ; if(selections == null) { selections = new Vector() ; session.setAttribute(“selections”, selections) ; }}

As usual, the vending machine servlet will lead to a selection-viewing servlet when the user follows a suitable link.

These two servlets automatically share the same session object, and thus session information, because they are in the same servlet context.


The Scope of a Session

A servlet context is a group of servlets (and possibly other Web entities), collected together in some directory.

Under Tomcat, servlet contexts are defined in the server.xml file.– In the examples so far, the servlet context was /dbc.

Several servlets may be involved in the same session, hence share the same HttpSession object.

This sharing is automatic if the servlets are in the same context, and are interacting with the same browser.

Servlets from different contexts in the same server, or interacting with different browsers, always have distinct HttpSession objects.


Life-Time of a Session

In general a session expires after some interval. The method:

public void setMaxInactiveInterval(int seconds)

on HttpSession can be used to request that the session will be invalidated if there has been no transaction during a period of the specified length.

The method: public void invalidate()

on HttpSession can be used to immediately invalidate a session.

CIS 5930-04 – Spring 2001

Documents

java server

standalone web server

web pages

java language

web browsers

appearance of java

javacentric approach

web site developer