SDPL 2002 Notes 3: XML Processor In terfaces 1 3. XML Processor APIs 3. XML Processor APIs How can applications manipulate How can applications manipulate structured documents? structured documents? – An overview of document parser An overview of document parser interfaces interfaces 3.1 SAX: an event-based interface 3.1 SAX: an event-based interface 3.2 DOM: an object-based interface 3.2 DOM: an object-based interface 3.3 JAXP: Java API for XML Processing 3.3 JAXP: Java API for XML Processing
27
Embed
SDPL 2002Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
SDPL 2002 Notes 3: XML Processor Interfaces 1
3. XML Processor APIs3. XML Processor APIs
How can applications manipulate How can applications manipulate structured documents?structured documents?– An overview of document parser interfacesAn overview of document parser interfaces
3.1 SAX: an event-based interface3.1 SAX: an event-based interface
3.2 DOM: an object-based interface3.2 DOM: an object-based interface
3.3 JAXP: Java API for XML Processing3.3 JAXP: Java API for XML Processing
Every XML application contains some kind of a Every XML application contains some kind of a parserparser– editors, browsers editors, browsers – transformation/style engines, DB loaders, ...transformation/style engines, DB loaders, ...
XML parsers are becoming standard tools of XML parsers are becoming standard tools of application development frameworksapplication development frameworks– JDK v. 1.4 contains JAXP, with its default parser JDK v. 1.4 contains JAXP, with its default parser
(Apache Crimson)(Apache Crimson)
(See, e.g., Leventhal, Lewis & Fuchs: Designing XML (See, e.g., Leventhal, Lewis & Fuchs: Designing XML Internet Applications, Chapter 10, and Internet Applications, Chapter 10, and D. Megginson: Events vs. Trees)D. Megginson: Events vs. Trees)
I: Event-based interfacesI: Event-based interfaces– Command line and ESIS interfacesCommand line and ESIS interfaces
» Element Structure Information Set, traditional Element Structure Information Set, traditional interface to stand-alone SGML parsersinterface to stand-alone SGML parsers
Application implements a set of Application implements a set of callback callback methodsmethods for handling parse events for handling parse events– parser notifies the application by method callsparser notifies the application by method calls– method parameters qualify events further method parameters qualify events further
» element type nameelement type name» names and values of attributesnames and values of attributes» values of content strings, …values of content strings, …
Idea behind ‘‘SAX’’ (Simple API for XML)Idea behind ‘‘SAX’’ (Simple API for XML)– an industry standard API for XML parsersan industry standard API for XML parsers– could think as “could think as “SSerial erial AAccess ccess XXML”ML”
SDPL 2002 Notes 3: XML Processor Interfaces 7
An event call-back applicationAn event call-back application
Application interacts with an object-oriented Application interacts with an object-oriented representation ofrepresentation of– the parserthe parser– the document the document parse treeparse tree consisting of objects like consisting of objects like
documentdocument, , element, attribute, textelement, attribute, text, …, … Abstraction level higher than in event based Abstraction level higher than in event based
interfaces; more powerful access interfaces; more powerful access – to descendants, following siblings, …to descendants, following siblings, …
An Object-Model Based ApplicationAn Object-Model Based Application
ApplicationApplication
ParserParserObjectObject
In-Memory In-Memory Document Document
RepresentationRepresentationParseParse
Access/Access/ModifyModify
BuildBuild
DocumentDocument
i=1i=1AA
"Hi!""Hi!"
<A i="1"><A i="1"> </A></A>Hi!Hi!
SDPL 2002 Notes 3: XML Processor Interfaces 10
3.1 The SAX Event Callback API3.1 The SAX Event Callback API
A de-facto industry standardA de-facto industry standard– NotNot an official standard or W3C Recommendation an official standard or W3C Recommendation– Developed by members of the xml-dev mailing listDeveloped by members of the xml-dev mailing list– Version 1.0 in May 1998, Vers. 2.0 in May 2000Version 1.0 in May 1998, Vers. 2.0 in May 2000– NotNot a parser, but a common interface for many a parser, but a common interface for many
different parsers (like JDBC is a common interface different parsers (like JDBC is a common interface to various RDBs)to various RDBs)
Supported directly by major XML parsersSupported directly by major XML parsers– most Java based and free: Sun JAXP, IBM most Java based and free: Sun JAXP, IBM
XML4J, Oracle's XML Parser for Java, Apache XML4J, Oracle's XML Parser for Java, Apache Xerces; MSXML (in IE 5), James Clark's XPXerces; MSXML (in IE 5), James Clark's XP
SDPL 2002 Notes 3: XML Processor Interfaces 11
SAX 2.0 InterfacesSAX 2.0 Interfaces
Interplay between an application and a SAX-Interplay between an application and a SAX-conformant parser specified in terms of conformant parser specified in terms of interfaces interfaces (i.e., collections of methods)(i.e., collections of methods)
Classification of SAX interfaces:Classification of SAX interfaces:– Parser-to-application (or call-back) interfacesParser-to-application (or call-back) interfaces
» to attach special behaviour to parser-generated eventsto attach special behaviour to parser-generated events
– Application-to-parserApplication-to-parser» to use the parserto use the parser
– AuxiliaryAuxiliary» to manipulate parser-provided informationto manipulate parser-provided information
SDPL 2002 Notes 3: XML Processor Interfaces 12
Call-Back InterfacesCall-Back Interfaces
Implemented by Implemented by applicationapplication to override default to override default behaviour (of ignoring any event quietly)behaviour (of ignoring any event quietly)– ContentHandlerContentHandler
» methods to process document parsing eventsmethods to process document parsing events
– DTDHandlerDTDHandler» methods to receive notification of unparsed external methods to receive notification of unparsed external
entities and their notations declared in the DTDentities and their notations declared in the DTD
– ErrorHandlerErrorHandler» methods for handling parsing errors and warningsmethods for handling parsing errors and warnings
– EntityResolverEntityResolver» methods for customised processing of external entity methods for customised processing of external entity
AttributesAttributes– methods to access a list of attributes methods to access a list of attributes
LocatorLocator– methods for locating the origin of parse events (e.g. methods for locating the origin of parse events (e.g.
systemID, line and column numbers, say, for reporting systemID, line and column numbers, say, for reporting semantic errors controlled by the application) semantic errors controlled by the application)
SDPL 2002 Notes 3: XML Processor Interfaces 15
The The ContentHandlerContentHandler Interface Interface
Methods for receiving information of general Methods for receiving information of general document events. (See API documentation for a document events. (See API documentation for a complete list):complete list):
setDocumentLocator(Locator locator)setDocumentLocator(Locator locator) – Receive an object for locating the origin of SAX document Receive an object for locating the origin of SAX document
events (e.g. for reporting semantic errors controlled by the events (e.g. for reporting semantic errors controlled by the application) application)
startDocument();startDocument(); endDocument()endDocument() – notification of the beginning/end of a document. notification of the beginning/end of a document.
A SAX invocation of A SAX invocation of startElementstartElement for for xsl:templatexsl:template would pass following parameters:would pass following parameters:
A SAX invocation of A SAX invocation of startElementstartElement for for htmlhtml would give would give
– namespaceURI=namespaceURI=http://www.w3.org/TR/xhtml1/stricthttp://www.w3.org/TR/xhtml1/strict(as default namespace for element names without a (as default namespace for element names without a
prefix),prefix), localname = localname = html, html, qName =qName = html html
characters(char ch[], characters(char ch[], int start, int length) int start, int length)
– notification of character data. notification of character data. ignorableWhitespace(char ch[], ignorableWhitespace(char ch[],
int start, int length)int start, int length)– notification of ignorable whitespace in element content. notification of ignorable whitespace in element content.
<!DOCTYPE A [<!ELEMENT A (B)> <!DOCTYPE A [<!ELEMENT A (B)> <!ELEMENT B (#PCDATA)> ]> <!ELEMENT B (#PCDATA)> ]>
<A><A><B> <B> </B></B>
</A> </A>
Ignorable whitespaceIgnorable whitespace
Text contentText content
SDPL 2002 Notes 3: XML Processor Interfaces 19
SAX Processing Example (1)SAX Processing Example (1)
InputInput: XML representation of a personnel database:: XML representation of a personnel database:
Solution strategy using event-based processing:Solution strategy using event-based processing:– at the start of a at the start of a personperson, record the , record the idnum idnum (e.g.,(e.g., 1234)1234)– keep track of starts and ends of elements keep track of starts and ends of elements lastlast and and first, first,
in order to record content of those elements (e.g., in order to record content of those elements (e.g., "Kilpeläinen" and "Pekka")"Kilpeläinen" and "Pekka")
– at the end of each at the end of each personperson, output the collected data, output the collected data
SDPL 2002 Notes 3: XML Processor Interfaces 21
SAX Processing Example (3)SAX Processing Example (3)
ApplicationApplication: Begin by importing relevant classes:: Begin by importing relevant classes:
if (localName.equals("first")) if (localName.equals("first")) InFirst = true;InFirst = true;
if (localName.equals("last")) if (localName.equals("last")) InLast = true;InLast = true;
if (localName.equals("person")) if (localName.equals("person")) IdNum = atts.IdNum = atts.getValuegetValue("idnum");("idnum");
} // startElement} // startElement
SDPL 2002 Notes 3: XML Processor Interfaces 24
SAX Processing Example (6)SAX Processing Example (6)
Call-back methods continue:Call-back methods continue:– Record the text content of elements Record the text content of elements firstfirst and and last last
in corresponding variables:in corresponding variables:
public void public void characterscharacters ( (char ch[], int start, int length) {char ch[], int start, int length) {
if (InFirst) FirstName = if (InFirst) FirstName = new String(ch, start, length);new String(ch, start, length);
if (InLast) LastName = if (InLast) LastName = new String(ch, start, length);new String(ch, start, length);
} // characters } // characters
SDPL 2002 Notes 3: XML Processor Interfaces 25
SAX Processing Example (7)SAX Processing Example (7)
Call-back methods continue:Call-back methods continue:– at an exit from at an exit from personperson, output the collected data:, output the collected data:
public void public void endElementendElement(String namespaceURI, (String namespaceURI, String localName, String qName) {String localName, String qName) {
if (localName.equals("person")) if (localName.equals("person"))
SAX Processing Example (9)SAX Processing Example (9)
MainMain method continues: method continues:// Instantiate and pass a new // Instantiate and pass a new // ContentHandler to xmlReader:// ContentHandler to xmlReader:
ContentHandlerContentHandler handler = new SAXDBApp(); handler = new SAXDBApp(); xmlReader.setContentHandler(handler);xmlReader.setContentHandler(handler); for (int i = 0; i < args.length; i++) {for (int i = 0; i < args.length; i++) { xmlReader.parse(args[i]);xmlReader.parse(args[i]); }}