Top Banner
XML Tutorial Walter Underwood Senior Staff Engineer Infoseek [email protected]
72

XML Tutorial Walter Underwood Senior Staff Engineer Infoseek [email protected].

Dec 26, 2015

Download

Documents

Dustin Curtis
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

XML Tutorial

Walter UnderwoodSenior Staff Engineer

[email protected]

Page 2: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Outline

I. XML: Why? What is it?

II. Document Types: representing content

III. Stylesheets: representing presentation

Page 3: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Part I. Why? What is it?

Page 4: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

What is XML?

Extensible Markup Language Structured markup Simplified SGML Next-generation HTML W3C Recommendation (spec) Easy to use, easy to implement A buzzword the press can spell

Page 5: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

What is XML not?

A programming language A single document type (memo, paper) Replacement for MS Word or FrameMaker An ANSI or ISO standard

Page 6: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Family Tree

SGML (1985)

HTML (1993)

XML (1998)

GML (1969)

Dates are first publication of draft specification

Page 7: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Why not SGML?

Tools are hard to write Tools are expensive Depends on environment (interchange is

difficult) If it did the job, we'd already be using it

Page 8: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Why not HMTL?

Backward compatibility, old browsers Hard to extend (still no formulas, figures) Based on SGML (see previous slide) Too much illegal HTML in use, need clean

slate

Page 9: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

An HTML example

<html><body><h1>The Purple Cow</h1>I never saw a purple cow,<br>I never hope to see one;<br>But I can tell you, anyhow,<br>I'd rather see than be one.<br></body></html>

Page 10: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Same thing in XML<?xml version="1.0"?><!DOCTYPE TEI.2 SYSTEM "tei.dtd"><TEI.2><text><body><div1 type="poem"><head>The Purple Cow</head><lg><l>I never saw a purple cow,</l><l>I never hope to see one;</l><l>But I can tell you, anyhow,</l><l>I'd rather see than be one.</l></lg></div1></body></text></TEI.2>

Page 11: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Same thing formatted

The Purple Cow

I never saw a purple cow,I never hope to see one;But I can tell you, anyhow,I'd rather see than be one.

Page 12: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Basic Syntax

Starts with XML declaration<?xml version="1.0"?>

Rest of document inside the "root element"<TEI.2>…</TEI.2>

All text contained in some element<head>The Purple Cow</head>

Start and end tags must match exactly

Page 13: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Well-formed vs. Valid

XML must be well-formed correct syntax tags match, tags nest, all characters legal parser must reject if not well-formed

XML may be valid with respect to a DTD (Document Type Definition) tags are used correctly tags are all declared attributes are declared

Page 14: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Validity Checking

Checks everything specified in a DTD Can't check text (currency, spelling) Checks against DTD: this is a valid memo,

book, bibliography, ... XML editors usually require validity Other tools (search engines) might not

Page 15: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

XML Syntax

The XML declaration Elements Entities Text Declarations and Notations Processing Instructions Comments

Page 16: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

The XML Declaration

At very beginning of file Officially optional, but always use it Can declare version, encoding, standalone

Must be in that order Each is optional

Must declare other encodings <?xml encoding="Big5"?>

<?xml encoding="ISO-8859-1"?>

Page 17: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Elements

Containing: <person>Nico</person> Empty: <br/> Attributes: <date format="iso8601">… Names can be any Unicode character,

digit, or '.', '-', '_', or ':' (':' is reserved)

<Straße>Kurfürstendamm 175</Straße>

Page 18: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Elements Express Structure

Heading is inside poem element

<div1 type="poem"><head>The Purple Cow</head>

Shows the lines of the poem, not the line breaks on the page

I never saw a purple cow<br> HTML<l>I never saw a purple cow</l> XML

Space between elements is ignored

Page 19: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

The Document Tree

<TEI.2><text>

<body><div1>

<head></head><lg>

<l></l><l></l>

</lg></div1>

</body></text>

</TEI.2>

Page 20: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Elements and Attributes

Attributes can parameterize an element <div1 type="poem">

<div1 type="abstract"><div1 type="chapter"><date format="iso8601"><subject scheme="LCSH">

Not as flexible as elements Don't use to save bytes, compress instead

<author first="Fred" last="Flintstone"/> not good

Page 21: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Attribute Syntax

Name can be any Unicode character, digit, or '.', '-', '_', or ':' (':' is reserved)

Cannot repeat Order doesn't matter Values must be quoted (single or double) Values may not contain "<" Values may have defaults in DTD

Page 22: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Special Attributes

xml:lang for language id has unique identifier for element idref references an id xml:* is reserved

Page 23: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Just like HTML, but better Five predefined entities

&amp; &apos; &lt; &gt; &quot;

Define your own in DTD<!ENTITY euro "&#x20AC;">

Use numeric character references&#x20AC; &#8364;

Use Unicode directly

Entities

Page 24: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Text

Unicode 2.0, see www.unicode.org Use predefined entities (&lt; &amp; …)

XML Example: &lt;char>&amp;amp;&lt;/char>

CDATA ("character data") section for raw text without using entities<![CDATA[ XML example: <char>&amp;</char>

]]>

Page 25: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Declarations

Allow validity checking Optional May be internal (in document), external, or

both DTD (Document Type Definition) is all

active declarations Use existing DTDs when possible

Page 26: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

External DTD

Most common Use DOCTYPE declaration before root

element <!DOCTYPE greeting SYSTEM "hello.dtd">

<greeting>Hello, world!</greeting>

Page 27: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Internal (standalone) DTD

For custom documents Also uses DOCTYPE declaration

<!DOCTYPE greeting [<!ELEMENT greeting (#PCDATA)>]><greeting>Hello, world!</greeting>

Specify in XML declaration <?xml version="1.0" standalone="yes"?>

Page 28: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

External plus Internal DTD

Usually to declare entities Use DOCTYPE declaration before root

element <!DOCTYPE greeting SYSTEM "hello.dtd" [

<!ENTITY excl "&#x21;">]><greeting>Hello, world&excl;</greeting>

Page 29: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Element Type Declarations

Declare name Declare allowed content

<!ELEMENT a EMPTY><!ELEMENT b ANY><!ELEMENT either (one | theother)><!ELEMENT ordered (first, second)><!ELEMENT list (item+)><!ELEMENT dl ((dt?, dd?)*)><!ELEMENT text (#PCDATA)><!ELEMENT mixed (#PCDATA | b | i | em)>

Page 30: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Attribute List Declarations

Declare attributes for an element Declare value types Declare defaults

<!ATTLIST termdef id ID #REQUIRED name CDATA #IMPLIED><!ATTLIST list type (bullets|ordered|glossary) "ordered"><!ATTLIST form method CDATA #FIXED "POST">

Page 31: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Entity Declarations

Pretty names for characters <!ENTITY copy "&#x00A9;">

Boilerplate<!ENTITY copyright

"&copy; Infoseek Corp. 1999, All rights reserved">

Used extensively in complex DTDs

Page 32: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Notations

A name of something outside of XML an unparsed entity target of a processing instruction

Mostly useful to applications<!NOTATION WunderFormatter

SYSTEM "http://wunderco.com/formatter/">

Page 33: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Processing Instructions

Instructions to applications fonts? security? correctness checks?

Linking to a style sheet<?xml-stylesheet href="mystyle.css"

type="text/css"?>

Instructions to indexing robots<?robots index="no" follow="yes"?>

Page 34: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Comments

Like HTML and SGML<!-- a comment -->

Anything is OK inside a comment <!-- <head> & <tail> are elements -->

<!-- <?xml?> declaration goes here -->

But don't use structured comments, use processing instructions instead

<!-- Font: Treefrog --> wrong<?WunderFormatter font="Treefrog"?> right

Page 35: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Unicode and Encodings

Unicode in programs UCS-2: two-byte characters UCS-4: four-byte characters (future)

Unicode in files UTF-8: ASCII is ASCII, rest are 1- to 4-bytes UTF-16: two octets per character, initial

ASCII with numeric character references works, too (&#x00A9; for ©)

Page 36: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Part II. Document Types:representing content

Page 37: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

What is a "document type"?

Technical report Specification Bug report Experiment run summary Software manual Novel Poem Play

Page 38: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

What is a DTD?

"Document Type Definition" Bunch of XML declarations Usually external to document Designed for some purpose (use one that

matches your needs) Best left to experts

Page 39: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Types of Document Types Text

TEI (scholarly editions) DocBook (software documentation) NITF (news articles)

Data CML (Chemical Markup Language) AIML (Astronomical Instrument ML)

Mixed often custom (bug reports)

Page 40: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

A Bug Report Document

<?xml?><bugreport><product>xmltron</product><version>1.1</version><os>RTE</os><osversion>4.0</osversion><date scheme="ISO8601">1999-11-03</date><report><summary>doesn’t work</summary><detail>at all</detail></report><solution>none yet</solution></bugreport>

Page 41: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Make a Document Type

<!DOCTYPE bugreport [ <!-- declarations go here -->

]><bugreport> ...

Doctype and root element must match

Page 42: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Declarations for Elements

<!DOCTYPE bugreport [<!ELEMENT bugreport wait 'til next slide><!ELEMENT product #PCDATA><!ELEMENT version #PCDATA><!ELEMENT os #PCDATA><!ELEMENT osversion #PCDATA><!ELEMENT date #PCDATA><!ELEMENT report (summary, detail)><!ELEMENT summary #PCDATA><!ELEMENT detail #PCDATA><!ELEMENT solution #PCDATA>]>

Page 43: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Declaration for Root Element

<!DOCTYPE bugreport [<!ELEMENT bugreport (product, version, os, osversion, date, report, solution?)>

<solution> is optional, others required andmust be in this order.

Page 44: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Declarations for Attribures

<!ATTLIST date scheme CDATA #IMPLIED>

"CDATA" instead of "PCDATA" means it isn't "parsed" for entities

Page 45: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Declarations for Attributes

"CDATA" instead of "PCDATA" means it isn't "parsed" for entities (no markup)

#IMPLIED means optional (value implied by document)

separate ATTLIST declarations for the same element are OK

internal ATTLIST declarations override external

<!ATTLIST date scheme CDATA #IMPLIED>

Page 46: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Reusing Element Declarations

<product> <name>xmltron</name> <version>1.1</version></product><os> <name>RTE</name> <version>4.0</version></os>

Use the same elements for product andOS info.

Page 47: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

New Declarations for Elements

<!ELEMENT product (name, version)><!ELEMENT os (name, version)><!ELEMENT name #PCDATA><!ELEMENT version #PCDATA>

Page 48: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Customizing Existing DTDs

Add attributes Add entities Rarely change elements

Can't override element declarations Can add new child elements to those that allow

ANY

Some DTDs are designed for extensions

Page 49: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Part III. Stylesheets:representing presentation

Page 50: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

documents = contents + style

Extensible Stylesheet Language (XSL) Specifications still in draft But implementations keeping pace

Page 51: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

XSL is in Three Parts

XSLT: transformation XPath: addressing XML entities FO: formatting objects

We will cover only XSLT today

Page 52: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Client-side XSL

XML

XSLT

FO

Page 53: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Server-side XSL

XML

XSLT

XSLTengine HTML

Page 54: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

XML into HTML

XSLT can transform into (called "output method"): XML HTML text

Server-side XSLT engine content in XML served as HTML browser never knows

Page 55: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Transforming The Purple Cow

Add HTML intro and outro convert <head> to <h1> convert <lg> to <p> (at beginning of stanza) convert <l> to <br> (at end of line)

Page 56: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

The Purple Cow (XML)<?xml version="1.0"?><!DOCTYPE TEI.2 SYSTEM "tei.dtd"><?xml-stylesheet href="purple.xsl" type="text/xml"?><TEI.2><text><body><div1 type="poem"><head>The Purple Cow</head><lg><l>I never saw a purple cow,</l><l>I never hope to see one;</l><l>But I can tell you, anyhow,</l><l>I'd rather see than be one.</l></lg></div1></body></text></TEI.2>

Page 57: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

The Purple Cow (HTML)

<html><body><h1>The Purple Cow</h1>I never saw a purple cow,<br>I never hope to see one;<br>But I can tell you, anyhow,<br>I'd rather see than be one.<br></body></html>

Page 58: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Intro and Outro

<?xml version="1.0"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="TEI.2"> <html> <body> <xsl:apply-templates/> </body> </html> </xsl:template></xsl:stylesheet>

Page 59: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

XSLT So Far

It is XML Uses XML Namespaces—no name conflicts Defaults to text/xml output method Uses text/html if <html> is output at root Applies templates to input

Page 60: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

A Template for Text Content

<xsl:template match="head"> <h1> <xsl:apply-templates/> </h1></xsl:template>

Default element rule applies templates Default text rule copies to output IE5 doesn’t implement the default rules

Page 61: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Default Templates

<!-- Default template for elements, applies to children --><xsl:template match="*|/"> <xsl:apply-templates/></xsl:template>

<!-- Default template for text and attribute nodes, copies content to output --><xsl:template match="text()|@*"> <xsl:value-of-select="."/></xsl:template>

Page 62: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Line Groups and Lines

<!-- put a <p> before each stanza --><xsl:template match="lg"> <p> <xsl:apply-templates/></xsl:template>

<!-- put a <br> after each line --><xsl:template match="l"> <xsl:apply-templates/> <br></xsl:template>

Page 63: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

A Complete Stylesheet<?xml version="1.0"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="TEI.2"> <html><body><xsl:apply-templates/></body></html> </xsl:template> <xsl:template match="head"> <h1><xsl:apply-templates/></h1> </xsl:template> <xsl:template match="lg"> <p><xsl:apply-templates/> </xsl:template> <xsl:template match="l"> <xsl:apply-templates/><br> </xsl:template></xsl:stylesheet>

Page 64: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Other XSL Features

Cascading stylesheets Including stylesheets Conditionals (if/else), variables Relative selectors, XPath selectors Counting, sorting String and number manipulation Template modes (e.g. table-of-contents

and full)

Page 65: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Why do it?

Different HTML for different browsers(make sure the default works!)

Index only content with search engine

Generate RTF or TEX with text output method

Analyze XML files (all meta data defined?) Convert between DTDs

Page 66: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Questions?

Page 67: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Resources and URLs

Page 68: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

XML Information XML at W3C

www.w3.org/XML www.w3.org/TR/REC-xml

The Annotated XML Spec www.xml.com/pub/axml/axmlintro.html

The Robin Cover SGML/XML page (encyclopedic!) www.oasis-open.org/cover/

The XML Bible, Elliott Rusty Harold updates at: metalab.unc.edu/xml/books/bible/

www.xml.com (articles and directory)

Page 69: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

XML Software SAX (Simple API for XML)

www.megginson.com/SAX www.jclark.com/XML (C and Java parsers)

DOM (Document Object Model) www.w3c.org/DOM (specs) www.alphaworks.ibm.com (XML4J parser) developer.java.sun.com/developer/products/xml/(Project X)

Parser conformance testing www.xml.com/pub/1999/09/conformance/ www.oasis-open.org/cover/xmlConformance.html

Avoid MSXML (Microsoft), non-standard and buggy

Page 70: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

General DTD Resources

Structuring XML Documents, David Megginson The XML and SGML Cookbook : Recipes for

Structured Information, Rick Jellife more an SGML book, but excellent on Internationalization

Page 71: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

Specific DTD Resources

Inside XML DTDs: Scientific and Technical, Simon St. Laurent

DocBook: The Definitive Guide, Norman Walsh and Leonard Muellner

TEI Lite and Bare Bones TEI (SGML) www.tei-c.org (TEI Consortium) www-tei.uic.edu/orgs/tei/intros/teiu5.html www-tei.uic.edu/orgs/tei/intros/teiu6.html

Chemical Markup Language: www.xml-cml.org MathML: www.w3.org/TR/REC-MathML

Page 72: XML Tutorial Walter Underwood Senior Staff Engineer Infoseek wunder@infoseek.com.

XSL Resources

Warning: XSL changed in August 1999! W3C Style Activity

www.w3c.org/Style

Updated XSL chapter from The XML Bible metalab.unc.edu/xml/books/bible/updates/14.html

James Clark's XT (XSLT implementation) www.jclark.com/xml/xt.html