Topics • The "bigger picture" – The "XML sales pitch" – XML/XHTML vs. SGML/HTML – XML in electronic publishing – XML and the future, web 2.0 • XML basics: – Building blocks: elements, attributes, … – Structural constraints: Well-formed XML – Character sets – Namespaces – Validity: DTDs and XML schemas Week 0534 Introduction to XML 1
29
Embed
Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Topics
• The "bigger picture"– The "XML sales pitch"– XML/XHTML vs. SGML/HTML– XML in electronic publishing– XML and the future, web 2.0
• XML basics:– Building blocks: elements, attributes, …– Structural constraints: Well-formed XML– Character sets– Namespaces– Validity: DTDs and XML schemas
Week 0534 Introduction to XML 1
Week 0534 Introduction to XML 2
Why Use XML (1)
• Consider a line from a .dat file:2394287410|Verbatim|DataLife MF 2HD|10|3.5"|black
• or the XML-fragment:<product barcode="2394287410"> <manufacturer>Verbatim</manufacturer> <name>DataLife MF 2HD</name> <quantity>10</quantity> <size>3.5"</size> <color>black</color></product>
• Which one is easier to interpret, more robust, easier to use for complex structures?
Week 0534 Introduction to XML 3
Why Use XML (2)
• Simple syntax• Self documenting format• Support for hierarchical structures• Simple debugging (both for user as machine)• Language and platform independent• Many different tools• Growing library of ”standard” formats
Week 0534 Introduction to XML 4
Main Types of XML Documents
• Narrative-Centric Documents:– Largely with irregular structure, for instance a
novel
• Data-Centric Documents:– With a regular structure, for instance a telephone
directory
• Hybrid Documents:– Typically contains highly regular parts mixed with
irregular contents - e.g., a product catalog
Week 0534 Introduction to XML 5
XML/XHTML vs. SGML/HTML
• Problems with SGML/HTML:– SGML is a complex markup language– HTML is only suitable for narrative documents– HTML became a bad mix of structure and layout– HTML browsers are too tolerant for language
• The XML/XHTML promise:– XML has a simple and extendible structure– Suitable for both data and narrative documents– XHTML is for structure only - CSS is for layout– Enforces strict rules
Week 0534 Introduction to XML 6
XML in Electronic Publishing
• Some important XML-applications:– Text transformation/printing: XSLT, XSL-FO, SVG,
• The basic entity in XML• Consists of a start-tag, content and a end-tag• Simple content:
<title>Web page for IMT4501</title>
• Mixed content:<p> <strong>No</strong>, you can’t do<em>that</em>!</p>
• Empty element:<br></br><br /> <!-- Short version -->
Week 0534 Introduction to XML 12
Attributes
• Extra information about an element• Example:
<img height=”240” width=”320” src=”logo.gif” />
• Values enclosed by apostrophes in pairs:<a href=”http://www.hig.no/”>HiG</a><a href=’http://www.oa.no/’>Oppland Arbeiderblad</a>
• But not:<a href=”http://www.vg.no/’>VG</a><a href=’http://www.cnn.no/”>CNN</a>
Week 0534 Introduction to XML 13
Well-formed XML
• One root element• Correct nesting of elements• Always a matching end-tag to each element• Case sensitive names• Attribute values in quotes• One attribute can’t appear more than once inside an
element• No comments inside tags• No unescaped < or & inside text content
• Text is basically PCDATA (Parsed Character Data):– The parser replaces entity references with value
• CDATA can be used where we want the parser to interpret the character data:<logiskUttrykk> <![CDATA[(len > 0) && (len < 256)]]></logiskUttrykk>
Week 0534 Introduction to XML 19
Comments
• Enclosed by <!-- and -->• Should not appear inside a tag• A double hyphen -- can not appear anywhere inside
the comment• Are meant for users, not application• Correct use:
<FotoDB><!-- Example of image database dump --> <Image_series> ...
• Wrong use:<FotoDB <!-- Example of image database dump -->><!-- Not finished -- look at it later -->
Week 0534 Introduction to XML 20
Processing instructions
• Enclosed by <? and ?>• Target follows right after <?• Can be used to send information to the application• Comments were used before, but XML parsers can
choose not to send comments to the application• Example:
<?php $logged_in = $_SESSION[“logged_in”]; if (!$logged_in) { echo “You have to <a href=’login.php’>log in</a> first”; }?>
Week 0534 Introduction to XML 21
Exercise
• Complete the ZVON XML tutorial:http://www.zvon.org/xxl/XMLTutorial/General/contents.html
Week 0534 Introduction to XML 22
Character Sets
• Historically, character encoding has been a challence: – The same code has been used for different
characters on different systems
• Now, there are standards:– ISO-8859-1 (ISO Latin), ˝default˝ on the web– Unicode - defines a larger character set, used by
XML on default:• UTF-8 efficient for western languages• UTF-16• UTF-32
Week 0534 Introduction to XML 23
Namespaces – why?
• Distinguish between elements and attributes from different XML vocabularies
• Namespaces allow two or more XML vocabularies to use the same document
• Group all related elements and attributes from a single XML application – easier to be recognized by the software
Week 0534 Introduction to XML 24
Namespaces – how?
• A prefix attached to a vocabulary (identified by a URI) with attributes xmlns:<Description xmlns:dc=”http://purl.org/dc/”>
• The prefix is defined inside the sub tree where the element are root
• Elements in a vocabulary identified by the prefix:<Description xmlns:dc=”http://purl.org/dc/”> <dc:title>XML in a Nutshell</dc:title> <dc:creator>Elliotte Rusty Harold</dc:creator> <dc:creator>W. Scott Means</dc:creator> <dc:date>2002</dc:date></Description>
Week 0534 Introduction to XML 25
More about the prefix
• You choose the name of the prefix, the URI identifies the vocabulary
• The prefix has to be a leagal XML name
Week 0534 Introduction to XML 26
Namespaces – what is it really?
• A vocabulary identified by a fixed Uniform Resource Identifier:– http://...– ftp://...– …
• The URI has to be unique to make the vocabulary unique
• The URI does not need to point at any defined document
Week 0534 Introduction to XML 27
Example scope
Week 0534 Introduction to XML 28
Default namespace
• Default namespace can be used where all non-prefixed elements belongs to a fixed vocabulary
• Example:<RDF xmlns=”http://www.w3.org/TC/REC-rdf-syntax#”> <Description xmlns:dc=”http://purl.org/dc/”> <dc:title>XML in a Nutshell</dc:title> <dc:creator>Elliotte Rusty Harold</dc:creator> <dc:creator>W. Scott Means</dc:creator> <dc:date>2002</dc:date> </Description></RDF>
Week 0534 Introduction to XML 29
Exercise
• Complete the ZVON XML tutorial:http://www.zvon.org/xxl/NamespaceTutorial/