Top Banner
5 Processing XML
24

5 Processing XML 5 - 2 Parsing XML documents Document Object Model (DOM) Simple API for XML (SAX) Class generation Overview.

Dec 27, 2015

Download

Documents

Martin Ford
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5

Processing XML

Page 2: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 2

Parsing XML documents Document Object Model (DOM) Simple API for XML (SAX)

Class generation

Overview

Page 3: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 3

What's the Problem?

<?xml version="1.0"?><books> <book> <title>The XML Handbook</title> <author>Goldfarb</author> <author>Prescod</author> <publisher>Prentice Hall</publisher> <pages>655</pages> <isbn>0130811521</isbn> <price currency="USD">44.95</price>

</book> <book> <title>XML Design</title> <author>Spencer</author> <publisher>Wrox Press</publisher>

...</book>

</books>

?

Book

?

Page 4: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 4

Parsing XML Documents

Document Tree

Parser

Docu-ment

DTD /Schema

Applicationimplements

DocumentHandler

endDocument

startDocument

endElement

endElement

startElement

startElement

DOM SAX

Page 5: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 5

Parser

Project X (Sun Microsystems) Ælfred (Microstar Software) XML4J (IBM) Lark (Tim Bray) MSXML (Microsoft) XJ (Data Channel) Xerces (Apache) ...

Page 6: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 6

Prescod

book

PrenticeHall

<?xml version="1.0"?><books> <book> <title>The XML Handbook</title> <author>Goldfarb</author> <author>Prescod</author> <publisher>Prentice Hall</publisher> <pages>655</pages> <isbn>0130811521</isbn> <price currency="USD">44.95</price>

</book> <book> <title>XML Design</title> <author>Spencer</author> <publisher>Wrox Press</publisher>

...</book>

</books>

The Document Object Model

XML Document Structure

The XMLHandbook Goldfarb 655

books

book

publisher pages isbnauthortitle

...

Page 7: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 7

The Document Object Model

Provides a standard interface for access to and manipulation of XML structures.

Represents documents in the form of a hierarchy of nodes.

Is platform- and programming-language-neutral

Is a recommendation of the W3C (October 1, 1998)

Is implemented by many parsers

Page 8: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 8

DOM - Structure Model

Document

Node

NodeList

Element

Prescod

book

PrenticeHall

The XMLHandbook Goldfarb 655

books

book

publisher pages isbnauthortitle

...

Page 9: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 9

The Document Interface

Method Result

docTypeimplementationdocumentElementgetElementsByTagName(String)createTextNode(String)createComment(String)createElement(String)create CDATASection(String)

DocumentTypeDOMImplementationElementNodeListStringCommentElementCDATASection

Page 10: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 10

The Node Interface

Method Result

nodeNamenodeValuenodeTypeparentNodechildNodesfirstChildlastChildpreviousSiblingnextSiblingattributesinsertBefore(Node new,Node ref)replaceChild(Node new,Node old)removeChild(Node)hasChildNode

StringStringshortNodeNodeListNodeNodeNodeNodeNodeNamedMapNodeNodeNodeBoolean

Page 11: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 11

Node Types / Node NamesResult: NodeType /NodeName

Node Node Node Fields Type NameELEMENT_NODE 1 tagNameATTRIBUTE_NODE 2 name of attributeTEXT_NODE 3 "#text"CDATA_SECTION_NODE 4 "#cdata-section"ENTITY_REFERENCE_NODE 5 name of entity referencedENTITY_NODE 6 entity namePROCESSING_INSTRUCTION_NODE 7 targetCOMMENT_NODE 8 "#comment"DOCUMENT_NODE 9 "#document"DOCUMENT_TYPE_NODE 10 document type nameDOCUMENT_FRAGMENT_NODE 11 "#document-fragment"NOTATION_NODE 12 notation name

Page 12: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 12

The NodeList Interface

Method Result

lengthitem(int)

IntNode

Page 13: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 13

The Element Interface

Method Result

tagNamegetAttribute(String)setAttribute(String name, String value)removeAttribute(String)getAttributeNode(String)setAttributeNode(Attr)removeAttributeNode(String)getElementsByTagName

StringStringAttr

AttrAttr

NodeList

Page 14: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 14

DOM Methods for Navigation

firstChild lastChild

nextSiblingpreviousSibling

parentNode

getElementsByTagName

childNodes(length, item())

Page 15: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 15

DOM Methods for Manipulation

appendChildinsertBeforereplaceChildremoveChild

createElementcreateAttributecreateTextNode

Page 16: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 16

Example

Goldfarb Spencer

books

book book

author authorauthor

Prescod

doc.documentElement.childNodes.item(0).getElementsByTagName("author"). item(1).childNodes.item(0).datadoc.documentElement.childNodes.item(0).getElementsByTagName("author"). item(1).childNodes.item(0).data

Root NodeDOM

Object TextBookssecondAuthor

TextSubnodes

firstthereof

firstBook

Authors

Page 17: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 17

Script

<HTML><HEAD><TITLE>DOM Example</TITLE></HEAD><BODY><H1>DOM Example</H1><SCRIPT LANGUAGE="JavaScript">

var doc, root, book1, authors, author2; doc = new ActiveXObject("Microsoft.XMLDOM"); doc.async = false; doc.load("books.xml"); if (doc.parseError != 0)

alert(doc.parseError.reason); else {

root = doc.documentElement;document.write("Name of Root node: " + root.nodeName + "<BR>");document.write("Type of Root node: " + root.nodeType + "<BR>");book1 = root.childNodes.item(0);authors = book1.getElementsByTagName("author");document.write("Number of authors: " + authors.length + "<BR>");author2 = authors.item(1);document.write("Name of second author: " + author2.childNodes.item(0).data);}

</SCRIPT></BODY></HTML>

<HTML><HEAD><TITLE>DOM Example</TITLE></HEAD><BODY><H1>DOM Example</H1><SCRIPT LANGUAGE="JavaScript">

var doc, root, book1, authors, author2; doc = new ActiveXObject("Microsoft.XMLDOM"); doc.async = false; doc.load("books.xml"); if (doc.parseError != 0)

alert(doc.parseError.reason); else {

root = doc.documentElement;document.write("Name of Root node: " + root.nodeName + "<BR>");document.write("Type of Root node: " + root.nodeType + "<BR>");book1 = root.childNodes.item(0);authors = book1.getElementsByTagName("author");document.write("Number of authors: " + authors.length + "<BR>");author2 = authors.item(1);document.write("Name of second author: " + author2.childNodes.item(0).data);}

</SCRIPT></BODY></HTML>

Page 18: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 18

SAX - Simple API for XML

Docu-ment

DTD

Application

endDocument

startDocument

endElement

endElement

startElement

startElement

Parser

Page 19: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 19

SAX - Simple API for XML

Event-driven parsing model "Don't call the DOM, the parser calls you." Developed by the members of the XML-DEV Mailing List Released on May 11, 1998 Supported by many parsers ... ... but Ælfred is the saxon king.

Page 20: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 20

Procedure

DOM Creating a parser instance Parsing the whole document Processing the DOM tree

SAX Creating a parser instance Registrating event handlers with the parser Parser calls the event handler during parsing

Page 21: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 21

Namespace Support

<?xml version="1.0"?><order xmlns="http://www.net-standard.com/namespaces/order" xmlns:bk="http://www.net-standard.com/namespaces/books" xmlns:cust="http://www.net-standard.com/namespaces/customer">...<bk:book> <bk:title>XML Handbook</bk:title> <bk:isbn>0130811521</bk:isbn></bk:book>....</order>

Page 22: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 22

Access to Qualified Elements

Node "book"

bk:book

http://www.net-standard.com/namespaces/books

bk

book

Interface "Node"

DOM Level 2

Method

nodeName

namespaceURI

prefix

localName

qName

uri

localName

SAX 2.0

startElement

Page 23: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 23

Generation of Data Structures

DTD / Schema'yacht'

Generation

01 yacht05 name05 details10 type

Class

Processing

<?xml?><yacht yachtid='147'><name>Mona Lisa</name><image file='yacht147.jpg'/><description> Any text describing this yacht 147</description><details> <type>GULFSTAR 55</type> ength>1700</length> <width>480</width> <draft>170</draft> <sailsurface>112</sailsurface> <motor>84</motor> <headroom>202</headroom> <bunks>8</bunks></details></yacht>

01 yacht05 VENTANA05 details10 GULFSTAR 55

Object

Page 24: 5 Processing XML 5 - 2 Parsing XML documents  Document Object Model (DOM)  Simple API for XML (SAX) Class generation Overview.

5 - 24

Summary

To avoid expensive text processing, applications use an XML parser that creates a DOM tree of a document.

The DOM provides a standardized API to access the content of documents and to manipulate them.

Alternatively or additionally, applications can work event-based using the SAX interface, which is provided by many parsers.