XML - Why: The HTML-Dilemma • HTML, SGML, XML - How: Syntax, Concept, Language Elements • Basics • Well-formed XML-Documents (without DTD) • Valid XML-Documents (with DTD) • Attributes, Entities, Style Sheets • More concepts from the „XML family“
Jan 12, 2016
XML
- Why: The HTML-Dilemma• HTML, SGML, XML
- How: Syntax, Concept, Language Elements• Basics• Well-formed XML-Documents (without DTD)• Valid XML-Documents (with DTD)• Attributes, Entities, Style Sheets• More concepts from the „XML family“
The HTML-Dilemma
HTML - a language to markup documents
<H1>Heading 1</H1><H2>Heading 2</H2><p>paragraph<p>...
The HTML-Dilemma
HTML is ...
simple
...but unfortunately...
Extensibility: No semantic markup Structure: No complex structures beyond layout Validity: Structural weakness
SGML
SGML - Rules to define markup languages
+ Metalanguage: Highly flexible
+ Architecture to process data on different media without losing the structure of the data
¬ Complexity (user, programmer)
XML: The Language Concept
What is XML ? Extensible Markup Language (XML) is a text-based
meta-markup language which allows you to define an infinite number of markup languages based upon the standards defined by XML.
Rather than providing a set of pre-defined tags, as with HTML, XML specifies the standards with which you can define your own markup languages with their own sets of tags.
XML is - as SGML - based upon the idea of structured markup of data
structure
layout
presentation
content
XML: The Language Concept
XML: The Language Concept
• Tags and attributes can be defined individually
• Document structure in any complexity can be described
• XML-documents can - but don‘t have to - contain a formal description of their grammar
HTML XML
<P> <strong>Bosak, Jon
</strong> XML, Java, and
the future of the Web </P>
<?xml version="1.0"?>
<ARTICLE>
<AUTHOR>Bosak, Jon</AUTHOR>
<TITLE>XML, Java, and the future of the Web </TITLE>
</ARTICLE>
XML: The Language Concept
XML consists of tags <TAG>content</TAG>
...that are nested <TAG><OneMoreTag>content</OneMoreTag></TAG>
...and that constitute an XML-document, if some well-formedness rules are met.
<?xml version="1.0"?>
Well-formed documents
• Every open tag must explicitly be closed
• Empty elements (<IMG> in HTML) in XML are written as <IMG/> or closed
• Attribute-values are to be put in quotation marks: <?xml version=”1.0”?>
• Child markup must nest completely within parent markup, i.e. markup needs to be completely hierarchical (as SGML)
• No markup-character (< or &) in text, all attributes are CDATA by default
• You should declare your XML version at the start: <?xml version=”1.0”?>
Well-formed document„ORDER“
<?xml version="1.0" ?> <ORDER>
<HEAD> <NAME>Mustermann</NAME> <DATE>02.10.1998</DATE> <E-MAIL>[email protected]</E-MAIL>
</HEAD> <BODY>
<ITEM> <DESCRIPTION>cd rom drive</DESCRIPTION> <ARTICLE-NO>123456</ARTICLE-NO> <AMOUNT>5</AMOUNT>
</ITEM> <ITEM>
<DESCRIPTION>monitor</DESCRIPTION> <ARTICLE-NO>9876</ARTICLE-NO> <AMOUNT>1</AMOUNT>
</ITEM> </BODY>
</ORDER>
XML Basics
XML-documents are well-formed if they conform with basic syntax requirements
XML provides rules for defining markup languages. There are two ways of defining these rules (i.e. the grammar of a particular markup language
XML-documents can contain an explicit definition of required/allowed tags and their structure, i.e. a Document Type
Definition (DTD). XML-documents that confirm with a DTD are valid
Valid document„Order“
<?xml version="1.0" ?> <!DOCTYPE ORDER SYSTEM “ORDER.DTD"><ORDER>
<HEAD> <NAME>Mustermann</NAME> <DATE>02.10.1998</DATE> <E-MAIL>[email protected]</E-MAIL>
</HEAD> <BODY>
<ITEM> <DESCRIPTION>cd rom drive</DESCRIPTION> <ARTICLE-NO>123456</ARTICLE-NO> <AMOUNT>5</AMOUNT>
</ITEM> <ITEM>
<DESCRIPTION>monitor</DESCRIPTION> <ARTICLE-NO>9876</ARTICLE-NO> <AMOUNT>1</AMOUNT>
</ITEM> </BODY>
</ORDER>
DTD of valid document „Order“
<!ELEMENT ORDER (HEAD, BODY)>
<!ELEMENT HEAD (NAME, DATE, E-MAIL)>
<!ELEMENT NAME (#PCDATA)>
<!ELEMENT DATE (#PCDATA)>
<!ELEMENT E-MAIL (#PCDATA)>
<!ELEMENT BODY (ITEM)+>
<!ELEMENT ITEM (DESCRIPTION, ARTICLE-NO, AMOUNT)>
<!ELEMENT DESCRIPTION (#PCDATA)>
<!ELEMENT ARTICLE-NO (#PCDATA)>
<!ELEMENT AMOUNT (#PCDATA)>
ORDER.DTD
Declaration of elements in a DTD
Elements can contain other elements or character data
<!ELEMENT HEAD (NAME, DATE, E-MAIL)>
<!ELEMENT NAME (#PCDATA)>
Elements can have mixed content
<!ELEMENT a (#PCDATA | b | c)*>
Elements can be defined as mandatory, optional, etc.
<!ELEMENT a (b, c?, (d|e)+, f*)<!ELEMENT e-mail (address, cc*, message, signature?)
Attributes
All elements can contain attributes:
<DESCRIPTION edifact=„UNH D0062.1“ lala=„123“>
Attributes have to be declared similar to elements:
<!ATTLIST DESCRIPTION edifact CDATA #REQUIRED>
Attributes can be optional, mandatory or „fixed“
<!ATTLIST DESCRIPTION ean CDATA #REQUIREDpicture CDATA #FIXED „http://my.pics.de/cd-rom.htm“status (sale | normal) „normal“>
Valid XML-Document
<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE ORDER SYSTEM „order2.dtd“> <ORDER>
<HEAD edifact=„UNH D0062.1“> <NAME>Mustermann</NAME> <DATE>02.10.1998</DATE> <E-MAIL>[email protected]</E-MAIL>
</HEAD> <BODY>
<ITEM> <DESCRIPTION ean=„3034152204082“
picture=„http://my.pics.de/cd-rom.htm“status=„sale“>cd rom drive</DESCRIPTION>
<ARTICLE-NO>123456</ARTICLE-NO> <AMOUNT>5</AMOUNT>
</ITEM> <ITEM>
.......</ITEM>
</BODY> </ORDER>
DTD
<!ELEMENT ORDER (HEAD, BODY)><!ELEMENT HEAD (NAME, F-NAME*, DATE, E-MAIL+)><!ATTLIST HEAD edifact CDATA #REQUIRED><!ELEMENT NAME (#PCDATA)><!ELEMENT F-NAME (#PCDATA)><!ELEMENT E-MAIL (#PCDATA)><!ELEMENT DATE (#PCDATA)><!ELEMENT BODY (ITEM)+><!ELEMENT ITEM (DESCRIPTION, ARTICLE-NO, AMOUNT)><!ELEMENT DESCRIPTION (#PCDATA)><!ATTLIST DESCRIPTION ean CDATA #REQUIRED
picture CDATA #FIXED „http://my.pics.de/cd-rom.htm“status (sale | normal) „normal“>
<!ELEMENT ARTICLE-NO (#PCDATA)><!ELEMENT AMOUNT (#PCDATA)>
Valid XML-documents
• An XML-document is valid if it is well-formed and conforms with the specifications as defined in a DTD.
• Any well-formed XML-document can become valid if it is made compliant with a DTD.
• Functionally, a DTD is analogous to a relational database schema or an IDL.
• Applications can use the DTD to check an XML-document instance for structural validity and to create new instances of the defined document type.
Internal DTDs
<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE ORDER[<!ELEMENT ORDER (HEAD, BODY)><!ELEMENT HEAD (NAME, DATE, E-MAIL)><!ELEMENT NAME (#PCDATA)><!ELEMENT DATE (#PCDATA)><!ELEMENT E-MAIL (#PCDATA)><!ELEMENT BODY (ITEM)+><!ELEMENT ITEM (DESCRIPTION, ARTICLE-NO, AMOUNT)><!ELEMENT DESCRIPTION (#PCDATA)><!ELEMENT ARTICLE-NO (#PCDATA)><!ELEMENT AMOUNT (#PCDATA)>]>
<ORDER> <HEAD>
<NAME>Mustermann</NAME> .............................
</ORDER>
DTDs can also be part of a document instance
Logical and physical structure of XML-documents
The logical structure is determined by the sequence of tags in the document.
Irrespective of the logical structure, an XML-document can be divided into any number of physical entities.
Thus, it is possible to combine physically distributed XML-data into one XML-document.
Entities references are used to refer to external data.
References pointing to entities are written between „&“ and „;“
External entity referneces
<!doctype ORDER
[ <!entity Head SYSTEM “HeadOrder.xml"> <!entity ItemsPC SYSTEM “Items/PC1.xml"> <!entity ItemsCD-ROM SYSTEM “http://cd.de/m2.xml"> ]>
<ORDER><CUSTOMER>&Head;</CUSTOMER> <SALESORDER> &ItemsPC;
&ItemsCD-ROM; </SALESORDER>
</ORDER >
XML-documents can be spread over different files:
XML Entities, Unicode
<!DOCTYPE EXAMPLE
[ <!ENTITY xml "Extensible Markup Language"> ]>
<EXAMPLE>The new standard &xml; supports international character sets (ISO-10646 (Unicode)); the example shows different notations for number „1“:
1 (in ASCII), ١ (in Devanagari), १ (in Arabisch) and ൧ (in Malayalam).
</EXAMPLE>
Presentation ofXML-documents
XML-documents are presented using style sheets.
A style sheet determines the document’s layout.
Style Sheets are referred to by a processing instruction, e.g.: <?xml-stylesheet type="text/css” href="style1.css"?>
W3C is developing XSL, a style sheet language for XML.
In addition, presentation of XML-documents in a browser, for example, is possible using CSS which is also used to display HTML.
Why 2 Style-Sheet-Languages?
1) CSS: Simple; every element is assigned a layout
2) XSL: More than CSS (Scripting, Transformation), but more complex
ORDER {background-color:blue}
NAME, DATE, E-MAIL {Display:Block; font-size:28pt; font-family:Times,serif}
E-MAIL {color:yellow}
<xsl:template match=“Article-No.”>
<P>
<xsl:process-children/>
</P>
</xsl:template>
XML and CSS
<?xml version="1.0" ?><?xml-stylesheet type="text/css" href="style1.css"?> <ORDER> <HEAD>
<NAME>Mustermann</NAME> ..............
</ORDER>
ORDER { Display: Block; background-color: blue; float: left; padding: 15pt}
NAME, DATE, E-MAIL {Display: Block; font-size: 28pt; font-family: Times, serif}
E-MAIL {color:yellow}
BODY {Display: Block; background-color: green; float: left; padding: 12pt}
DESCRIPTION {font-size: 28pt; font-family: Times, sans-serif}
+ =
The XML-family
Besides the specifications of XML 1.0 (recommendation since 10.02.1998) there are more W3C initiatives on XML. The most important related standards are:
XLink (Working Draft, 26.07.1999)
XPointer (Working Draft, 09.07.1999)
XML Namespaces (Recommendation, 14.01.1999)
XSL (Working Draft 21.04.1999)
DOM (Recommendation, 01.10.1998)
RDF (Recommendation, 24.02.1999)
XML Schemas (Working Drafts, 06.05.1999) (XML-Data, DCD, SOX, DDML)
Linking in XML
• XML supports much more powerful linking capabilities than HTML.
• XLink describes uni- as well as sophisticated multi-directional links.
• XPointer specifies a mechanism for pointing to fragments of a target document, even without identifiers: “book.html#section2”.
Einfacher Link Erweiterter Link(XLink)
Link auf Element innerhalb einer Instanz (XPointer)
simple link extended link(XLink)
link to element in instance (XPointer)
Namespaces in XML
How can an application know which namespace is relevant if different DTDs are in use (i.e. for own documents, data exchange or search engines)?
In order to prevent element and attribute names from colliding namespaces have been developed. Example: „Title“ (heading, evidence of ownership)
<EXAMPLE xmlns:h="http://www.w3.org/html4"xmlns:b="http://www.my.server.de/bibliography"xmlns:p="http://www.my.server.de/claims">
<h:caption>My XML text</h:caption><b:title>XML, Java and the future of the Web</b:title><p:title>realty</p:title>
</EXAMPLE>