Top Banner
XML, DOM and the Web XML, DOM and the Web XML, DOM and the Web XML, DOM and the Web Madalina Croitoru IUT Montpellier
45

XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

Jul 06, 2018

Download

Documents

ngokhuong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

XML, DOM and the WebXML, DOM and the WebXML, DOM and the WebXML, DOM and the WebMadalina CroitoruIUT Montpellier

Page 2: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

What is XML?What is XML?

• Extensible Markup Language

• Markup Language (like HTML)

• Difference with HTML:

– HTML: designed to display data– HTML: designed to display data

– XML: designed to transport and carry data

– HTML: tags are already predefined

– XML: you define you own tags

Page 3: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

XML “does nothing”XML “does nothing”

• XML was created to structure, store and transport information:

<note><to>Tove</to><from>Jani</from><from>Jani</from><heading>Reminder</heading><body>Don't forget me this

weekend!</body>

</note>

Page 4: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

XML and the WebXML and the Web

• Early Web: URI + HTTP +HTML– URIs identify resources– HTTP retrieves resources– HTML is the resource format

• Web Today: many different technologies:• Web Today: many different technologies:– URI+HTTP+HTML+PHP for basic Web publishing– CSS & JavaScript for advanced publishing

• JavaScript & XML (AJAX)– Scripts dynamically loading data from a server– Machine-to-machine interaction: the server and the script

Page 5: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

From Humans to MachinesFrom Humans to Machines

• The Web was designed for humans:– HTML is a language for describing page

layouts and links

– Machines were only used for implementing it

• Search engines were the first machine users on the Web:– They made the Web success possible

– They demonstrated how hard it is to understand HTML pages

Page 6: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

HTML is for HUMANSHTML is for HUMANS

• HTML is:

– GOOD for rendering Web pages

– BAD for understanding Web pages

• Web growth in the late 90’s was enormous:

– Everybody putting information online which was inaccessible for machines

Page 7: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

MachineMachine--Friendly WebFriendly Web

• Information should be published in a machine-understandable way:– Machines need other structures to process Web

content

• 1996: W3C Working Group SGML on the Web– SGML a very complex and expensive technology

– HTML is just one document type defined with SGLM

– How can SGML be made easily and widely usable?

Page 8: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

SGML, HTML and XMLSGML, HTML and XML

• SGML: Standard Generalised Markup Language– Language for designing document types

• HTML: Hypertext Markup Language – Implements a simple SGML document type

– Its syntax is SGML but it uses very few SGML features

• XML: Extensible Markup Language– A language for designing document types

– Greatly simplified version of SGML

Page 9: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

XML usage for the WebXML usage for the Web

• Server side foundation for Web publishing

• Successful:

– Technically sound (simple)

– Human-readable based on a well-known – Human-readable based on a well-known syntax

– Great for rapid prototyping and experiments

• Ontologies etc.

Page 10: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

XML Usage elsewhereXML Usage elsewhere

• Messages from sensors• Genome sequences• Scalable Vector Graphics• Etc etc• Etc etc

• Information professionals should know and use XML:– XML and some schema language– XSLT for processing

Page 11: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

XML is a syntax for treesXML is a syntax for trees

• Not all data is easily represented by trees

• XML encodes a structure purely on the syntactic level

• XML structures must be accompanied by • XML structures must be accompanied by semantic descriptions

Page 12: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

XML encodingXML encoding

• XML documents can use a wide array of characters – defined by Unicode

– Currently defines more than 100000 characterscharacters

<?xml version=“1.0” encoding = “UTF-8”?>

• XML processors must support UTF-8 and UTF-16

Page 13: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

Basic ConceptsBasic Concepts

• XML documents have an XML declaration

• Exactly one document element (the root element)

• Elements are marked up using tags• Elements are marked up using tags– Most elements have content, surrounded by

start and end tags

• Elements may be nested– Elements may be repeated

• Elements may have attributes

Page 14: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

Example 1Example 1

<?xml version="1.0" encoding="ISO-8859-1"?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><heading>Reminder</heading><body>Don't forget me this weekend!</body>

</note>

• root element: <note>

• 4 child elements of root: <to>, <from>, <heading>, and <body>

Page 15: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

XML Documents form a Tree StructureXML Documents form a Tree Structure

<root><child><subchild>.....</subchild>

</child></child>

</root>

Page 16: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

Example 2Example 2

<bookstore><book category="COOKING">

<title lang="en">Everyday Italian</title><author>Giada De Laurentiis</author><year>2005</year><price>30.00</price>

</book><book category="CHILDREN">

<title lang="en">Harry Potter</title><author>J K. Rowling</author><author>J K. Rowling</author><year>2005</year><price>29.99</price>

</book><book category="WEB">

<title lang="en">Learning XML</title><author>Erik T. Ray</author><year>2003</year><price>39.95</price>

</book></bookstore>

Page 17: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

Syntax RulesSyntax Rules

• All XML elements must have a closing tag

• XML tags are case sensitive

• XML elements must be properly nested

• XML Documents must have a root • XML Documents must have a root elements

• XML Attribute values must be quoted

Page 18: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

Entity references and commentsEntity references and comments

• This will generate an XML error:

<message>if salary < 1000 then</message>

• Use instead:

<message>if salary &lt; 1000 then</message>

• <!-- This is an XML comment -->

• Attention: with xml the white space in a document is preserved

Page 19: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

XML attributes vs ElementsXML attributes vs Elements

• <person sex="female"><firstname>Anna</firstname><lastname>Smith</lastname>

</person>

• <person>• <person><sex>female</sex><firstname>Anna</firstname><lastname>Smith</lastname>

</person>

• As much as possible try to use elements rather than attributes

Page 20: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

Why avoid XML attributes?Why avoid XML attributes?

• Problems with using attributes:

– attributes cannot contain multiple values (elements can)

– attributes cannot contain tree structures – attributes cannot contain tree structures (elements can)

– attributes are not easily expandable (for future changes)

Page 21: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

Valid XML DocumentsValid XML Documents

• A document which conforms to the rules for a Document Type Definition (DTD)<?xml version="1.0" encoding="ISO-8859-1"?><!DOCTYPE note SYSTEM "Note.dtd"><note><note>

<to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body>

</note>

Page 22: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

XML DTDXML DTD

• DOCTYPE is a reference to an external DTD file

• DTD file: defines the structure of an XML document<!DOCTYPE note

[[<!ELEMENT note (to,from,heading,body)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>

]>

Page 23: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

XML SchemaXML Schema

• W3C supports an XML-alternative to DTD called XML –Schema:

<xs:element name="note">

<xs:complexType><xs:complexType><xs:sequence><xs:element name="to" type="xs:string"/><xs:element name="from" type="xs:string"/><xs:element name="heading" type="xs:string"/><xs:element name="body" type="xs:string"/>

</xs:sequence></xs:complexType>

</xs:element>

Page 24: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

XML DTDXML DTD

<!DOCTYPE note [<!ELEMENT note (to,from,heading,body)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>]>

• !DOCTYPE note defines that the root element of this document is note

• !ELEMENT note defines that the note element contains four elements: "to,from,heading,body"

• !ELEMENT to defines the to element to be of type "#PCDATA"

• !ELEMENT from defines the from element to be of type "#PCDATA"

• !ELEMENT heading defines the heading element to be of type "#PCDATA"

• !ELEMENT body defines the body element to be of type "#PCDATA"

Page 25: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

XML DTDXML DTD

• External DTD declaration:<!DOCTYPE root-element SYSTEM "filename">

• XML file:<?xml version="1.0"?><!DOCTYPE note SYSTEM "note.dtd"><!DOCTYPE note SYSTEM "note.dtd">…

• DTD file:<!ELEMENT … >…

Page 26: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

DTD View: XML building blocksDTD View: XML building blocks

• Elements• Attributes – extra information about

elements. Come in name/value pairs• Entities: &lt, &gt, &amp, &quot, &apos• PCDATA: Parsed character data. The text • PCDATA: Parsed character data. The text

that will be parsed by a parser for entities and markup

• CDATA: Character data. The text that will not be parsed by a parser (tags inside the text will not be treated as markup)

Page 27: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

Declaring elementsDeclaring elements

• <!ELEMENT element-name category>or<!ELEMENT element-name (element-content)>

– <!ELEMENT element-name EMPTY>– <!ELEMENT element-name EMPTY>

– <!ELEMENT element-name (#PCDATA)>

– <!ELEMENT element-name (child1,child2,...)>

– <!ELEMENT note (message+)>

– <!ELEMENT note (message*)>

– <!ELEMENT note (to,from,header,(message|body))>

Page 28: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

Declaring attributesDeclaring attributes

• <!ATTLIST element-name attribute-name attribute-type default-value>

– <!ATTLIST payment type CDATA "check">

• Attribute type:– CDATA The value is character data – (en1|en2|..) The value must be one from an enumerated list – ID The value is a unique id

• Default-value:– #REQUIRED The attribute is required – #IMPLIED The attribute is not required – #FIXED value The attribute value is fixed

Page 29: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

XML Schema definesXML Schema defines

• elements that can appear in a document• attributes that can appear in a document• which elements are child elements• the order of child elements• the number of child elements• the number of child elements• whether an element is empty or can include

text• data types for elements and attributes• default and fixed values for elements and

attributes

Page 30: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

XML, DTD, XML SchemaXML, DTD, XML Schema

<?xml version="1.0"?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body>

<!ELEMENT note (to, from, heading, body)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>

<body>Don't forget me this weekend!</body></note>

<?xml version="1.0"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"targetNamespace="http://www.w3schools.com"xmlns="http://www.w3schools.com"elementFormDefault="qualified">

<xs:element name="note"><xs:complexType>

<xs:sequence><xs:element name="to" type="xs:string"/><xs:element name="from" type="xs:string"/><xs:element name="heading" type="xs:string"/><xs:element name="body" type="xs:string"/>

</xs:sequence></xs:complexType>

</xs:element>

</xs:schema>

Page 31: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

Referencing an external XML SchemaReferencing an external XML Schema

<?xml version="1.0"?>

<notexmlns="http://www.w3schools.com"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3schools.com note.xsd"><to>Tove</to>

Page 32: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

The <schema> ElementThe <schema> Element

• The root of every XML Schema

<xs:schema>.........

</xs:schema>

Page 33: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

Simple Elements and AttributesSimple Elements and Attributes

• <xs:element name="xxx" type="yyy"/>

• <xs:attribute name="xxx" type="yyy"/>

– xs:string

– xs:decimal– xs:decimal

– xs:integer

– xs:boolean

– xs:date

– xs:time

Page 34: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

Complex ElementsComplex Elements

• <employee><firstname>John</firstname><lastname>Smith</lastname>

</employee>

• <xs:element name="employee">• <xs:element name="employee"><xs:complexType><xs:sequence><xs:element name="firstname" type="xs:string"/><xs:element name="lastname" type="xs:string"/>

</xs:sequence></xs:complexType>

</xs:element>

Page 35: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

DOMDOM

• W3C standard

• Defines a standard for accessing documents like XML and HTML:– Objects and properties of all document elements

– Methods to access them– Methods to access them

• Three parts:– Core DOM: standard model for any structured

document

– XML DOM: standard model for XML documents

– HTML DOM: standard model for HTML documents

Page 36: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

XML DOMXML DOM

• Standard object model for XML

• Standard programming interface for XML

• A standard for how to get, change, add or delete XML elements:delete XML elements:

– The entire document is a document node

– Every XML element is an element node

– The text in the XML elements are text nodes

– Every attribute is an attribute node

– Comments are comment nodes

Page 37: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

XML DOM Node TreeXML DOM Node Tree

• XML DOM views an XML document as a tree-structure (called a node-tree)

Page 38: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

Node Parents, Children, SiblingsNode Parents, Children, Siblings

– In a node tree, the top node is called the root

– Every node, except the root, has exactly one parent node

– A node can have any number of children

– A leaf is a node with no children– A leaf is a node with no children

– Siblings are nodes with the same parent

• XML parser reads the XML, and converts it into an XML DOM object that can be accessed using different languages

Page 39: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

The HTML DOM Node TreeThe HTML DOM Node Tree

Page 40: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

DOM Model?

• OK – we have the model – what do we do with it?

– Manipulate it with a programming language!

• In this lecture and practical lesson – PHP• In this lecture and practical lesson – PHP

• Next week: JavaScript

Page 41: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

DOM and PHP: the functions

• Let us consider the following XML file called note.xml:

<?xml version="1.0" encoding="ISO-8859-1"?><?xml version="1.0" encoding="ISO-8859-1"?><note>

<to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body>

</note>

Page 42: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

Load and Output XML

<?php$xmlDoc = new DOMDocument();//creates a DOMDocument-Object

$xmlDoc->load("note.xml");// loads the XML // loads the XML print $xmlDoc->saveXML();//puts the internal XML document into a string

?>

The output of this should be:

Tove Jani Reminder Don't forget me this weekend!

Page 43: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

Looping through XML

<?php

$xmlDoc = new DOMDocument();$xmlDoc->load("note.xml");

$x = $xmlDoc->documentElement;$x = $xmlDoc->documentElement;

foreach ($x->childNodes AS $item){print $item->nodeName . " = " . $item->nodeValue . "<br />";

}

?>

Page 44: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

Other interesting functions

• getElementsByTagName();$books = $doc->getElementsByTagName( "book" );

foreach( $books as $book ) {

$authors = $book->getElementsByTagName( "author" );

$author = $authors->item(0)->nodeValue; }}

– The script uses the getElementsByName method to get a list of all of the elements with the given name.

– Within the loop of the book nodes, the script uses the getElementsByName method to get the nodeValuefor the author tags. The nodeValue is the text withinthe node.

Page 45: XML, DOM and the Web - Page d'accueil / Lirmm.fr / - lirmmcroitoru/XML-PHP.pdf · • 1996: W3C Working Group SGML on the Web – SGML a very complex and expensive technology –

More information (French)

• http://eusebius.developpez.com/php5dom/

• http://www.scriptol.fr/xml/dom.php

• http://durand.iut-amiens.fr/mcr51:cours:dom#chargement_amiens.fr/mcr51:cours:dom#chargement_d_un_fichier_xml_ou_d_une_chaine_xml_php