Top Banner
An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-IT LORIA [email protected] romary@loria. fr eXtensible Markup Language version 1.0 Recommendation, February 1998
37

An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA [email protected] [email protected] eXtensible Markup Language version 1.0 Recommendation,

Dec 27, 2015

Download

Documents

Raymond Gaines
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

An Introduction to XML

Patrice Bonhomme & Laurent Romary

Lucid-IT LORIA

[email protected]

[email protected]

eXtensible Markup Language version 1.0Recommendation, February 1998

Page 2: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

Objectives

Understanding the basic concepts of XML Elements, attributes and content DTD (, Schemas) Namespaces

An overview of the main associated recommendations: XML path language (XPath) XML pointers and links (Xpointer and XLink) The transformation language of XSL (eXtensible Stylesheet Language)

Page 3: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

XML in the document chain

Edition

XML

Data

Transformation

XML

XSL/XSLT

Data processing

Consultation

HTMLXHTML

User perspective

Conception

DTD/Schema

Structures

Page 4: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

A quick historical overview

1986 SGML (Standard Generalized Markup Language) ISO standard: ISO:8879:1986

1987 TEI (Text Encoding Initiative)

1990 HTML 1.0 (HyperText Markup Language)

1997/1998 XML 1.0 (eXtensible Markup Language)

Page 5: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

What XML is:

XML: eXtended Markup Language A W3C (World Wide Web Consortium)

Recommendation A meta-language: it allows one to define his

own markup language A simplification of the SGML standard

SGML was intended to represent the “logical” structure of a document

HTML was conceived as an application of SGML

Page 6: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

A simplified SGML

An XML document is an SGML document With some slight (but essential) differences...

XML has the expressive power of SGML without its complexity

Opens the door to the transmission of structured documents on the web Databases also entered the game...

Page 7: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

What can we do with it?

Data modeling (in complement to UML for instance)

Publication of structured data on the web Separation of the logical structure of a

document from its actual presentation Distributed applications (cf. well-formed vs.

valid documents) Integrating data from heterogeneous sources

Page 8: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

Why can’t we avoid it? Simplicity, which makes it simple to integrate into any

kind of application XML specifications = 36 pages SGML standard, ISO-8879 = 250 pages

Wide variety of application already implemented Industry: Publishing, Databases, Cataloguing, e-business etc. Science, research: genomics, astronomy, maths, etc.

Consequence: a lot of software available: editors, parsers, bridges from and to

existing editing environment or DBMSs

Page 9: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

From HTML to XML - 1

A simple HTML document:

<B> Patrice Bonhomme </B><P>[email protected] <BR>tél : 03 83 59 30 52 <BR>fax : 03 83 41 30 79 <BR>équipe : Langue et Dialogue (<I>LORIA</I>)<BR>

Page 10: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

From HTML to XML - 2

The XML way:<?xml version="1.0" encoding="iso-8859-1"?><!DOCTYPE MEMBRE SYSTEM "http://…/MEMBRE.dtd"><!-- Un membre du LORIA --><MEMBRE TYPE="IE" ID="M28"><NOM> BONHOMME </NOM><PRENOM> Patrice </PRENOM><MEL> [email protected] </MEL><TEL> 03 83 59 30 52 </TEL><FAX> 03 83 41 30 79 </FAX><EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE>

</MEMBRE>

Page 11: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

Some properties of XML

Emphasis should be put on the “semantics” of a document

Underlying model: tree structure Possibility to imagine a script language to

access any part of an XML documente.g.: DB/MEMBRE[28]/MEL/text()

XML supports Unicode character encodings

Page 12: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

Elements and their content

<MEMBRE TYPE="IE" ID="M28"> <LOGIN ID="bonhomme"/> <NOM> BONHOMME </NOM>

<PRENOM> Patrice </PRENOM><MEL> [email protected] </MEL><TEL> 03 83 59 30 52 </TEL><FAX> 03 83 41 30 79 </FAX><EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE>

</MEMBRE>

Opening tag

Closing tagTextual content

Empty element

Element

Page 13: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

Elements and their attribute

<MEMBRE TYPE="IE" ID="M28"> <LOGIN ID="bonhomme"/> <NOM> BONHOMME </NOM>

<PRENOM> Patrice </PRENOM><MEL> [email protected] </MEL><TEL> 03 83 59 30 52 </TEL><FAX> 03 83 41 30 79 </FAX><EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE>

</MEMBRE>

Attribut name Attribut value

Page 14: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

Other features

XML declaration<?xml version=“1.0"?><?xml version="1.0" encoding="UTF-8" standalone="yes"?>

Commentaries<!-- ceci est un commentaire -->

CDATA section<![CDATA[Langue & Dialogue]]>

Processing instruction (application specific)<?edit line="wrap"?>

Page 15: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

From one document to a class…

How do I know the

structure of my

document?

How may I share this

structure with others?

Page 16: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

Document Type Definition

Expresses constraints on: Allowed element and attribute names Possible content of a given element (“content

model”) To which elements a given attribute can be

attached Similar to the traditional SGML approach, but:

Simplified syntax The DTD is optional for a document

Page 17: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

Example<!ELEMENT

MEMBRE (LOGIN, NOM?, PRENOM?,MEL, TEL+, FAX*, EQUIPE)>

<!ELEMENT LOGIN EMPTY><!ATTLIST LOGIN ID ID #REQUIRED>

<!ELEMENT NOM (#PCDATA)>...<!ENTITY W3C "World Wide Web Consortium"><!ENTITY chap1 SYSTEM "http://…/chapitre-1.xml"><!ENTITY img2 SYSTEM "image2.gif" NDATA gif>...

Page 18: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

Using a DTD<!DOCTYPE MEMBRE SYSTEM "http://…/MEMBRE.dtd">

<MEMBRE TYPE="IE" ID="M28">

</MEMBRE>

<!DOCTYPE MEMBRE [

<!ELEMENT MEMBRE … >

]>

<MEMBRE TYPE="IE" ID="M28">

</MEMBRE>

Page 19: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

Valid vs. Well-formed

Well-formed documents Syntactic bracketing is preserved, without a DTD Empty element:

<toto></toto> = <toto/>

Valid documents With a DTD (à la SGML)

Essential difference with SGML Extracting and re-using document fragments One usually produce valid document and distribute well-

formed ones

Page 20: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

XML namespaces Objectives: avoid conflicts between element and

attribute names coming from various sources Composite documents XSLT instructions, Schema declarations

Declaration:<DOC xmlns:mml="http://www.w3.org/Math/MathML/" xmlns="http://www.ua99.net/DOC/1.0"> <P>blah blah : <mml:fn mml:definitionURL="mydef.xml"> … </mml:fn> re blah blah</P></DOC>

Page 21: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

Reserved namespaces

The xml: prefix is reserved by the W3C for specific attributes:

<title xml:space="default">...</title>

<p xml:lang="FR">…</p>

Page 22: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

XPath

XML Path Language 1.0 REC 29012000 Wide purpose syntax for addressing sub-parts of an

XML document Joint specification used by XML Pointers

(XPointer recommendation) and the XSLT transformation language

Allows one to access, select and filter XML fragments (cf. Tree representation of an XML document)

Page 23: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

Addressing nodes in XPath

Absolute addressing Given: a URL id(M28), root()

Relative addressing along axes Given: a node ancestor, child descendant psibling, fsibling

Page 24: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

An XML document represents a hierarchical structure

LOGINid="bonhomme"

BONHOMME

NOM ...

Langue et Dialogue

EQUIPELAB="LORIA"

MEMBRETYPE="IE" ID="M28"

The only view youshould ever, ever haveof an XML document

Page 25: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

XPath - Exemples<DB> <MEMBRE TYPE="IE" ID="M28"> <LOGIN ID="bonhomme"/> ...

<EQUIPE LAB="LORIA">Langue et Dialogue</EQUIPE> </MEMBRE> <MEMBRE TYPE="CR" ID="M14"> <LOGIN ID="romary"/> ... </MEMBRE></DB>

/DB/MEMBRE[@ID=‘M28’]/EQUIPE[1]/text()/DB/MEMBRE[2]

/DB/MEMBRE/LOGIN[@ID=‘romary’]/../@ID

/ ou /DB /DB/MEMBRE

Page 26: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

XPointer

Cf. HTML, anchors are needed:<A NAME="TOTO">http://www.titi.fr/index.html#toto

In XML, pointers can directly address a document component:http://…/doc.xml#xptr(id(M28))

http://…/doc.xml#xptr(/DB/MEMBRE[28]/MEL)

Advantage: no need to modify the target document (notion of primary source)

Page 27: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

XLink

In HTML: the elements which may carry links are known:<A>, <IMG>, ...

In XML: any element may carry a simple or complex link This is done by using pre-defined attributes:<a xlink:type="simple" xlink:href="http://www.w3.org/">W3C</a>

Page 28: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

Visualizing XML documents

Basically, an XML document does not provide any information about its presentation

Visualizing a document may depend on the target audience, device etc.

Stylesheets: Casdading Style Sheets (CSS 1 et 2) Extensible Style Language (XSL) >> XSLT

Page 29: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

eXtensible Style Language Describes the way a

document will be shown, printed or verbalized…

XMLXSL+

Page 30: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

XSL: a two-fold proposal

XSL = Transformations + Visualizing properties XSLT : Transformation of XML documents

Allows one to transform an XML document into another XML document

Use this to produce well-formed (!) HTML documents XSL FO: formatting XML data

FO = Formatting Objects Is supposed to be application independent (Word/RTF, PS,

PDF, MIF, …) Not a recommendation yet :-(

Page 31: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

General structure of an XSL document

<?xml version="1.0"?><xsl:stylesheet

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

… <xsl:template match="/"> … </xsl:template>

<xsl:template match="NOM"> … </xsl:template></xsl:stylesheet>

Page 32: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

Declarative approach

Sequence of rules (templates) specifying: The pattern (XPath) of nodes to which the rule can

be applied Actions to be undertaken:

Elements to be generated in the target document Selection of the elements to be further explored in the

source document Additional functionalities: testing, sorting, etc.

Page 33: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

A simple rule

<xsl:template match='/DB/MEMBRE/NOM'> <B> <xsl:apply-templates/> </B></xsl:template>

pattern (XPath)

The content of <B>will be the one

produced by the instruction

HTML element to be produced

Page 34: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

Creating a HTML core document

<xsl:template match=“/”> <HTML> <HEAD> <TITLE>My directory</TITLE>

</HEAD> <BODY> <xsl:apply-templates/> </BODY></HTML>

</xsl:template>

Page 35: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

Selecting the nodes to be explored

<xsl:template match=“MEMBRE”> <P>

<xsl:apply-templatesselect=“NOM”/>

<xsl:text> - </xsl:text>

<xsl:apply-templatesselect=“EQUIPE”/>

</P></xsl:template>

Page 36: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

Conclusion

XML - a practical format (protocol) Next steps:

Sharing DTD, resources tools Generic mechanisms for handling families of

documents (cf. Nancy’s presentation)

Page 37: An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA bonhomme@lucid-it.com romary@loria.fr eXtensible Markup Language version 1.0 Recommendation,

References

www.oasis-open.org/cover/

www.w3.org/XML/

www.w3.org/TR

www.w3.org/TR/REC-xml

babel.alis.com/web_ml/xml/REC-xml.fr.html

www.xml.com

www.xmlinfo.com

xml.apache.org