[email protected]1 CIS 5930-04 – Spring 2001 http://aspen.csit.fsu.edu/it1spring01 Instructors: Geoffrey Fox , Bryan Carpenter Computational Science and Information Technology Florida State University Acknowledgements: Nancy McCracken Syracuse University Part 6: Introduction to CGI and Servlets
Part 6: Introduction to CGI and Servlets. CIS 5930-04 – Spring 2001. http://aspen.csit.fsu.edu/it1spring01 Instructors: Geoffrey Fox , Bryan Carpenter Computational Science and Information Technology Florida State University Acknowledgements: Nancy McCracken - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
RMI gave us one approach to client/server programming.
The approach was based on the Java language and some far-reaching ideas about remote objects, object serialization, and dynamic class loading.
We could achieve direct integration into the traditional World Wide Web through applets, but the technology is not specifically tied to the Web.
RMI is powerful and general (and interesting), but it can be a slightly heavy-handed approach if actually we only need to interact with users through Web pages.
For the future, it may be more natural to view RMI as a technology for the “middle tier” (or for connectivity in the LAN) rather than for the Web client.
Invocation of a single Java method is typically much cheaper than starting a whole new program. So servlets are typically more efficient than CGI scripts.– This is important if we planning to centralize processing
in the server (rather than, say, delegate processing to an applet or browser script).
Besides this we have the usual advantages of Java:– Portability,– A fully object-oriented environment for large-scale
program development.– Library infrastructure for decoding form data, handling
cookies, etc (although many of these things are also available in Perl).
– Servlets are the foundation for Java Server Pages.
The HTTP GET request consists of a series text fields on separate lines, ended by an empty line.
The first line is the most important: it is called the method field.
In simple GET requests, the second token in the method line is the requested file name, expressed as a path relative to the document root of the server.
When the form specifying the get method is submitted, the values inputted by the user are effectively appended to the end of the URL specified in the action attribute.
In the HTTP GET request—sent when the submit button is pressed—they appear attached to the second token of the first line of the request.
In simple cases the appended string begins with a ? This is followed by pairs of the form name=value,
where name is the name appearing in the name attribute of the input tag, and value is the value entered by the user.
If the form has multiple input fields, the pairs are separated by &
The input to a text field can be masked by setting the type attribute to password. The entered text will not be echoed to the screen.
If the type attribute is set to hidden, the input field is not displayed at all. This kind of field is often used in HTML forms dynamically generated by CGI scripts.
Hidden fields allow the CGI scripts to keep track of “session” information over an interaction that involves multiple forms—hidden fields may contain values characterizing the session.– Use of hidden fields will be one of the topics in the
For long lists of options, when checkboxes become too tedious:<select name=pets size=3 multiple> <option value=dog> Dog <option value=cat> Cat <option value=bird> Bird <option value=fish> Fish</select>
The value attribute in the option tag is optional: default value returned is the displayed string, immediately following the tag.
Without the multiple attribute, only a single option can be selected.
In conventional CGI programming, the URL in the action attribute of a form will identify an executable file somewhere in the Web Server’s document hierarchy.
A common server convention is that these executables live in a subdirectory of cgi-bin/
The executable file may be written in any language. For definiteness we will assume it is written in Perl, and refer to it as a CGI script.
The Web Server program will invoke the CGI script, and pass it the form data, either through environment variables or by piping data to standard input of the script.
The CGI script generates a response to the form, which is piped to the Web server through its standard output, then returned to the browser.
At the most basic level, a CGI script must – Parse the input (the form data) from the server, and– Generate a response.
Most often the response is the text of a dynamically generated HTML document, preceded by some HTTP headers.– In practice the only required HTTP header is the
Content-type header. The Web Server will fill in other necessary headers automatically.
Even if there is no meaningful response to the input data, the CGI script must output an empty message, or some error message.– Otherwise the server will not close the connection to
Retrieving Form Data Several environment variables are set up by the server
to pass information about the request to the Perl script.
If the form data was sent using a GET request, the most important is QUERY_STRING, which contains all the text in the URL following the first ? character.
If the form data was sent using a POST request, the environment variable CONTENT_LENGTH contains the length in bytes of the posted data. To retrieve this data, these bytes are read from the standard input of the script.
Standard Web servers typically need some additional software to allow them to run servlets. Options include:– Apache Tomcat The official reference implementation for the servlet
2.2 and JSP 1.1 specifications. It can stand alone or be integrated into the Apache Web server.
– JavaServer Web Development Kit (JSWDK) A small standalone Web server mainly intended for
servlet development.– Sun’s Java Web server An early server supporting servlets. Now apparently
obsolete.– Allaire JRun, New Atlanta’s ServletExec, . . .
In these lectures we will use Apache Tomcat for examples.
For debugging of servlets it seems to be necessary to use a stand-alone server, dedicated to the application you are developing.– The current architecture of servlets makes revision of
servlet classes already loaded in a Web server either disruptive or expensive. In general you need to establish your classes are working smoothly before they are deployed in a production server.
Hence you will be encouraged to install your own private server for developing Web applications.
Tomcat is the flagship product of the Jakarta project, which produces server software based on Java.
The system manager would like to be able to keep track of who is running what Web server.– Also we want to avoid overloading the course hosts.
You will each be allocated a port number on one of the three course hosts. Please stick with this port number and host for you main server.– You can run additional servers on random port numbers
for brief experiments, but please not for extended periods.– Of course avoid port numbers allocated to other students!
Your Tomcat home directory should be directly nested in your top-level home directory. – The management reserves the right to read and modify
your server configuration if it seems to be causing problems.
Edit the file jakarta-tomcat-X.X/conf/server.xml. Find the Connector element that defines the parameters of the HTTP connection handler. It looks like:
Removing the AJP Connector In the file jakarta-tomcat-X.X/conf/server.xml you
will also find a Connector element defining the parameters of an “AJP connection handler” (used for interactions with an Apache server). It looks like:<Connector className=“. . .”> <Parameter name=“handler”
If you are using a course host, change the value of the port parameter from its default 8007 to a value unique to you—e.g. the a port number one greater than your HttpConnectionHandler port.
Even if you are not going to use the Apache connection, the shutdown.sh script also uses this port, so the connection handler is still required.
If you are running your server on a course host, and your allocated host/port pair is host/XXXX, point your browser at the URL: http://host.csit.fsu.edu:XXXX
You should see the default Tomcat home page. In the Tomcat 3.1 release, the file for this home
Creating a Context Before writing a servlet, you need a place to put it.
Shut down your server, if it is running. In the file jakarta-tomcat-X.X/conf/server.xml, find
the example Context elements. Add a new context element such as:
<Context path=“/dbc” docBase=“webapps/dbc/” debug=“0” reloadable=“true”> </Context>– The path attribute defines a logical path that appears in the
URL.– The docBase attribute defines the physical directory where
HTML and servlets live.– Be careful to put /s in all the right places!– The reloadable flag is supposed to allow servlet classes to
be reloaded into a running server if they have been modified. We set it true because that is the recommended default during development. Note, however, it does not work very reliably!
This can be created as a subdirectory of jakarta-tomcat-X.X/webapps/
With the server configuration defined above, I create a subdirectory: jakarta-tomcat-X.X/webapps/dbc/
This will be the root directory for my HTML documents.
To check my configuration is working properly, I can put a file index.html in dbc/, restart my server, and point my browser at: http://host.csit.fsu.edu:XXXX/dbc
where host/XXXX is my host/port pair. I should see the contents of the HTML file.
This program should be contained in a file HelloWorld.java, which may be placed in the classes/ subdirectory.
HttpServlet is the base class for servlets running in HTTP servers. Although servlets can be written for other kinds of server, in reality servlets are nearly always HttpServlets.
The doGet() method is called in response to an HTTP GET request directed at the servlet.
As the names suggest, the arguments describe the browser’s request and the servlet’s response.
Before writing to the output stream associated with the response, the content type header (at least) must be set.
Note that by default the Tomcat server will run with the same privileges as the user who started it.
This means you don’t actually need to make files world readable (because you have privileges to read them).
It also means you have to be careful. If you stick with this default you must never deploy servlets that have the power to damage or compromise your account– e.g. by reading or writing arbitrary files, or executing random
Any servlet class implements the interface javax.servlet.Servlet.
This interface defines a few low-level methods, including the low-level request-handling method, service().– Perhaps the only method from Servlet you will use
explicitly is getServletConfig(). All servlets we will be concerned with are
extended from the base class javax.servlet.http.HttpServlet (which implements Servlet).
Servlet Instances By default, (at most) one instance of a given
servlet class will ever be created by a Web server process.
By default, the servlet class is loaded into the Web server’s JVM, and the unique servlet instance is created, the first time any client sends a request to a URL identifying the servlet class.
Subsequent requests to the same URL are all handled by the same servlet class instance.– By default, however, each request is handled in a
different Java thread.
This means that a later request can access results of processing an earlier request through values of instance variables (or class variables).
The init() method: public void init() throws ServletException {. . .}
is quite analogous to the init() method on applets. It is called once when the servlet is created. You
override it to define initialization code for your servlet instance.
As with applets, this is used in preference to defining a non-default constructor, because you are allowed to access initialization parameters inside init() (but not in a constructor).– There is another lower-level init() method:
public void init(ServletConfig config) throws ServletException {. . .}
Don’t override it. Instead, if you need a ServletConfig during initialization, call getServletConfig() in the body of the no-argument init().
Finally a servlet can also override public void destroy() {. . .}
If the Web server terminates gracefully, it will invoke destroy() on all servlet instances it holds before shutting down.
In principle, this is a place where you can put code to back-up the current state of the servlet to persistent storage. The servlet can restart from restored state when the Web server is restarted.
In practice, servers (especially Tomcat!) often terminate “ungracefully”, when the system crashes or the server process is killed. Relying on destroy() methods being called is probably not advisable.
), I get a response page containing the message: This servlet has been accessed 0 times
Each time I reload the URL, the count increases. Since count is an instance variable of the class, this
illustrates that indeed only a single instance of Counter is created.
This servlet is not completely reliable, because it is possible to have concurrent requests in different threads. The instance variable count is shared by threads. This could lead to problems of interference.
– servlet instance variables,– servlet class variables,– external files, etc
that may be modified by any HTTP request on the servlet, should be guarded by synchronized methods or a synchronized statement. This is very important!
For example, the increment of count could be done in a synchronized statement: int myCount ; synchronized(this) myCount = count++ ;
Subsequently the local variable myCount—which is private to the thread—is printed in the response.
The getMethod() method on the HttpServletRequest returns the HTTP method appearing in the header.
For example, when a client sends the HTTP HEAD request, the server is supposed to treat it like a GET request, but return the headers only—not the data.
The server will automatically discard any data doGet() returns, but (if you had the urge) you could make things a bit more efficient as follows: public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException,
Accept: MIME types the browser can handleAccept-Charset: Character sets the browser can
handleAccept-Encoding: Encoding (e.g gzip)Accept-Language: English (en), etc.Authorization: User ID/passwordCache-Control: For proxy servers.Connection: Can the browser keep connections alive?Content-Length: of POSTed dataContent-Type: MIME encodingCookie: Cookies previously received from this site.Expect: Browser wishes to attach a documentFrom: email address of requester.Host: host/port information on original URL
If-Match:If-Modified-Since: only send recently changed
data.If-Match:If-None-Match:If-Range:If-Unmodified-Since: Used with PUT.Pragma: onlystandard value is no-cache.Proxy-Authorization:Range: Get part of document.Referer: Set if was link from a Web pageUpgrade: Change protocolUser-Agent: Identifies browser Via: Set by gateways and proxiesWarning:
If a form parameter can have more than one value (e.g. a value from a menu allowing multiple selections), you should apply the method: public String [] getParamterValues(String name)
{. . .}
to the HttpServletRequest object. Recall this example from the section on forms:
<select name=pets size=3 multiple> <option value=dog> Dog <option value=cat> Cat <option value=bird> Bird <option value=fish> Fish </select>
The form may send the data in a GET request to the following servlet.
This servlet will simply print value of the Content-Type header, and the raw version of the posted data.
In general it is not safe to combine this style of reading data, using getReader(), with the higher-level approach, using getParameter()—choose one or the other.
HTTP Status Codes100 Continue: Response to Expect request.100 Switching Protocols: Response to Upgrade request.200 OK: OK!201 Created: Server created a document. URL follows.202 Accepted: Processing is in progress.203 Non-Authoritative Information:204 No Content: No new document is available.205 Reset Content: Clear form fields.206 Partial Content: Response to Range request.300 Multiple Choices: Trick question?301 Moved Permanently: Document is elsewhere302 Found: Redirects the browser to a different URL.303 See Other: Please use GET instead of POST.304 Not Modified: Response to request with If-Modified-
Since.305 Use Proxy: Go to proxy at returned URL307 Temporary Redirect: like 302.
HTTP Status Codes (cont.)400 Bad Request: Syntax error.401 Unauthorized: No appropriate Authorization header403 Forbidden: Not allowed with any authorization404 Not Found: Not at this address.405 Method Not Allowed: Self explanatory.406 Not Acceptable: Resource doesn’t match Accept header.407 Proxy Authentication Required:408 Request Timeout: Client took too long sending request.409 Conflict: Used with PUT.410 Gone: Document has gone.411 Length Required: Content-Length missing (in POST).412 Precondition Failed:413 Request Entity Too Large: Document too big to handle.414 Request URI Too Long: URI is too long415 Unsupported Media Type:416 Requested Range Not Satisfiable:417 Expectation Failed: Disillusioned?
Explicitly Returning Status Codes These status values are available as predefined
constants in the HttpServletResponse class: final int SC_OK = 200 ; final int SC_FOUND = 302 ; final int SC_NOT_FOUND = 404 ;
etc. The default status is equivalent to explicitly doing: resp.setStatus(HttpServletResponse.SC_OK) ;
There are a couple of convenience methods on HttpServletResponse for dealing with common cases:void sendError(int sc, String message)– send specified status, with generated page containing message.void sendRedirect(String location)– send SC_TEMPORARY_REDIRECT status, and include
By sending the SC_FOUND or SC_TEMPORARY_REDIRECT status, together with a dynamically generated URL, a servlet can cause a the browser to go directly to a different page or site (without the user manually clicking another link).
Following is a simplified version of an example from “Core Servlets and Java Server Pages”.
It allows the user to specify a search string and a preferred search engine, dynamically generates a query URL for the chosen search engine, and redirects the browser to that URL.
HTTP is a stateless protocol—it provides no intrinsic way to associate one request/response transaction with any subsequent transactions.
But very often a Web application requires that the server engage in a non-trivial dialog with a single user, involving multiple client requests and server responses.
So the problem is to find ways to define and keep track of a particular “session” between browser and Web server.
Solutions There are three solutions in common use:
– Hidden Form Fields Assumes all client requests associated with the session
are form submissions. The forms must be dynamically generated by the server, and include hidden input fields that preserve session information.
– URL-Rewriting Again assumes all pages associated with the session are
dynamically generated by the server. Session information is directly appended to any URLs referring back to the server in the generated pages.
– Cookies An extension to HTTP allows a server to ask a browser to
store small amount of persistent information. The browser returns this information in HTTP request headers, typically whenever the client revisits a Web server on the same host.
The classic example of “session information” is the contents of a customer’s shopping cart at an online store.
In the interests of fitting code in slides, we scale this down and deal with selections from a virtual snack-vending machine.– “. . . Think of clocks and counters and telephones and
board games and vending machines.” C.A.R Hoare, Communicating Sequential Processes, 1985.
If I click on a couple of the selections on the initial page, apparently nothing changes—each selection returns a generated page that looks identical in the browser.
The approach is quite elegant, but it has some problems:– All interactions between client and server must go through
forms.
– Every form on every generated page must include the hidden fields defining the session state.
– In our example, the number of hidden fields grew quickly.
All approaches to session tracking run into problems analogous to the last: one wishes to keep down the amount of hidden information that must be exchanged in every single transaction of a session.
For example, this will be important for the URL-rewriting approach, because we don’t want to end up with huge URLs.
Session IDs The direct “hidden fields” approach does not
store session state in any fixed place. The “state” is somehow encoded by the current point in an ongoing dialog.– Perhaps reminiscent of simulation of state by lazy lists
in functional programming languages??
This is interesting, but, as noted, it means that the associated information is constantly swapped between client and server.
An obvious solution is for the server to store the bulk of the data associated with each active session.
The only session information bounced back and forth between client and server is an immutable identifier for the session.
This is sufficiently safe if we make the often-reasonable assumption that there are no concurrently active transactions involving the same session.– Without this assumption, access to the individual session
records should be synchronized as well.
The selection-viewing servlet can access the session table in the first servlet class by VendingMachine2.sessionTable.
Our simplified implementation will fail ungraciously if the server is restarted while a browser is in the middle of a session.
The session record disappears, while the session ID may still be stored in the browser.
Unless session data is stored persistently there is no completely satisfactory solution, but a servlet writer should be aware of this possibility, and code defensively (perhaps sending an explanatory message to the browser).
URL-rewriting can be regarded as an optimization of the hidden fields approach.
Assuming a form with a hidden field is submitted using the GET method, what the server really sees is just a request whose URI has been extended with an encoding of the value in the hidden field.
In URL-rewriting we cut out role of the browser (encoding session data from hidden fields) and directly extend the URL in the action attribute of the form with an encoding of the session data.
As a byproduct, this also works for URLs in anchor elements (simple hypertext links).
The session ID information is appended to the servlet URL in the action attribute of the forms (recall selectURL is the URL of this servlet).
On any invocation, after the first in the session, this information can be retrieved using getPathInfo().
The getPathInfo() method on HttpServletRequest returns any text in the request URL following the servlet name (up to and excluding the ? that delimits the query string, if there is one).
We now have the option to replace the form that connects to a selection-viewing servlet with a simple anchor element. – The URL in the anchor element is extended with the session
ID information, just like the action attribute in a form.
Recognizing a regular customer– A persistent cookie can save some identification
information for the particular customer. The stored information may be actual name and details, or (preferably) some key into a database on the server.
– When the customer returns to the site, associated information (mailing address, etc) is already known; it doesn’t have to be entered anew by the customer.
– There are many variations on this theme, e.g. it allows portal sites to do focussed advertising.
Session Tracking– Within the context of a single “visit” to a site, cookies
can be used as an alternative to hidden fields or URL-rewriting, as the underlying mechanism for session tracking.
Abuses of Cookies A poorly constructed commercial site might use
cookies to store sensitive information (e.g. credit card numbers) on the hard disk of your PC. This might be a privacy problem if the PC is shared by several users.
A Web site can persuade a browser to send a cookie to a third party site, by embedding an image that comes from the Web server of the third party.– The third party site might offer the original site collated
information on its visitors.– It may be a particular nuisance if the third party has
previously harvested the email address of the user, e.g. by sending them an HTML email containing a cookie-setting icon.
– Moral: configure your browser to only send cookies to the actual page you are visiting?
Typically a browser will restrict the number and size of cookies it will accept, e.g.:– Maximum of 20 cookies per site,– Maximum of 300 cookies total (from all sites),– Maximum size of individual cookie is 4 kilobytes.
Users may of course configure their browsers to refuse all cookies, or only accept selected cookies.
Hence a Web application should not rely on cookies for basic functionality—only for “added value”.
The servlet creates a cookie by using a constructor for the class Cookie.
Various attributes can be set for the cookie before sending it to the client. They include:– The name and value of cookie. These are usually set in the Cookie constructor.– The domain to which the cookie should be returned. By default the cookie will only be returned to the
server that sent it, but this default can be overridden.– The URI path to which the cookie should be returned. By default, the cookie is only returned to pages in the
same directory as the page that sent the cookie.– The time when a persistent cookie expires e.g., the cookie should be deleted by the browser after
After deploying this servlet, we can view the headers it returns by modifying the TrivialBrowser class (introduced in the network programming lecture) to take host, port, and path arguments. . .
We have already illustrated several underlying approaches to session tracking:– Hidden fields, URL-rewriting, cookies.
In general an application has to make a choice between these mechanisms, taking into account support in server and browser.
Session cookies may be favored on grounds of generality and flexibility, but not all clients will accept them.
In practice the servlet programmer does not have to worry too much about these issues. A high-level API is provided that will transparently choose and deploy a suitable low-level tracking mechanism.
The true argument of getSession() means that a new session object will be created if one does not already exist.– For robustness you should probably always use this
argument. (The documentation says that the form of getSession() without an argument is equivalent. Experience suggests maybe not?)
This servlet simply outputs a link back to itself (it doesn’t explicitly use the session object).
One important thing to note is the call to encodeURL().
This method should be applied to any URLs in the generated page that refer back to the same servlet context.
This supports URL-rewriting (if this is the session-tracking strategy adopted for the session).
Pointing the browser at this page, we see a page containing a link: View servlet again
If we view the HTML source of this generated page, we may see something like:<html><head></head><body><a href=http://sirah.csit.fsu.edu:8081/dbc/servlet/
GetSession;jsessionid=To1019mC0 . . . 365At > View servlet again </a></body></html>
The URL in this first generated page has been rewritten to include an attribute jsessionid. The associated value is a long, random-looking string.
Of course this is not particularly useful unless we have a way to associate application information with the session.
In previous examples we used a HashMap, keyed by session ID, to store session data.
We may assume that analogous mechanisms are used behind the scenes in the session-tracking API, but the session ID is not usually directly accessed by the programmer.
Instead, the application programmer just sees the HttpSession object. Methods are available to directly “cache” information in this object.– The session object itself behaves like a simple
In well-written Java programs, local variables are normally declared inside methods to hold values that are computed and used by only a single method invocation.
Typically, instance variables are used to hold values that need to be shared across multiple invocations.
In servlet programming—where several sessions may be concurrently operating on the single servlet instance—this role for instance variables is naturally taken over by attributes of the session object.
Think hard before declaring an instance variable in a servlet. In many cases you should probably be using a session attribute instead.