Querying XML Querying XML Documents and Documents and Data Data CBU Summer School CBU Summer School 13.8. - 20.8.2007 (2 ECTS) 13.8. - 20.8.2007 (2 ECTS) Prof. Pekka Kilpeläinen Prof. Pekka Kilpeläinen Univ of Kuopio, Dept of Computer Univ of Kuopio, Dept of Computer Science Science [email protected][email protected]
Querying XML Documents and Data. CBU Summer School 13.8. - 20.8.2007 (2 ECTS) Prof. Pekka Kilpeläinen Univ of Kuopio, Dept of Computer Science [email protected]. order. XML. invoice. Internet. Introduction & Motivation. XML appears everywhere How to query it?. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Querying XML Querying XML Documents and DataDocuments and Data
CBU Summer School CBU Summer School 13.8. - 20.8.2007 (2 ECTS)13.8. - 20.8.2007 (2 ECTS)Prof. Pekka KilpeläinenProf. Pekka Kilpeläinen
Univ of Kuopio, Dept of Computer ScienceUniv of Kuopio, Dept of Computer [email protected]@cs.uku.fi
XML appears everywhereXML appears everywhere How to query it?How to query it?
XMLXML
InternetInternet
orderorder
invoiceinvoice
CBU Summerschool '07
Querying XML: Introduction 3
Main Topic: Two XML Query ModelsMain Topic: Two XML Query Models
Region algebraRegion algebra– for retrieval of structuded textfor retrieval of structuded text– "lightweight""lightweight"
» reduced language; for ad-hoc files; efficient reduced language; for ad-hoc files; efficient free implementationfree implementation
XQueryXQuery– for general querying/manipulation of XMLfor general querying/manipulation of XML– "heavy""heavy"
» comprehensive and complex language; for (data comprehensive and complex language; for (data viewed as) XML only; production-use viewed as) XML only; production-use implementations?implementations?
CBU Summerschool '07
Querying XML: Introduction 4
Course OutlineCourse Outline
Intro and Arrangements; Intro and Arrangements;
Structured documentsStructured documents
1 Review of XML Basics 1 Review of XML Basics 1.1 XML and XML docs; 1.2 Document grammars 1.1 XML and XML docs; 1.2 Document grammars
1.3. XML DTDs; 1.4 XML Namespaces 1.3. XML DTDs; 1.4 XML Namespaces
1.5 XML Schema1.5 XML Schema
2 Region Algebra and sgrep2 Region Algebra and sgrep
3 W3C XQuery, and XPath 2.03 W3C XQuery, and XPath 2.0
(Apologies for potential dis-organization!)(Apologies for potential dis-organization!)
CBU Summerschool '07
Querying XML: Introduction 5
ArrangementsArrangements
Background: Background: W3C Recommendations (XML, XQuery)W3C Recommendations (XML, XQuery) Reports (Region Algebra, sgrep)Reports (Region Algebra, sgrep) Earlier courses; own research and experimentsEarlier courses; own research and experiments
Some material (to be posted) at Some material (to be posted) at http://www.cs.uku.fi/~kilpelai/CBU07/http://www.cs.uku.fi/~kilpelai/CBU07/
Plan: Plan: Lectures 12 h; Lectures 12 h; hands-on exercises 8 h hands-on exercises 8 h
CBU Summerschool '07
Querying XML: Introduction 6
Structured DocumentsStructured Documents
DocumentDocument: : – a structured representation of information on some a structured representation of information on some
medium (medium ( message) message)
– normally for a human readernormally for a human reader» memos, manuals, articles, books, …memos, manuals, articles, books, …
– also application-to-application messagesalso application-to-application messages» e.g., btw client and server in e.g., btw client and server in Web ServicesWeb Services
– "prose-oriented XML" vs "data-oriented XML""prose-oriented XML" vs "data-oriented XML"– can be treated as a single unit can be treated as a single unit
» (a web page vs a web site)(a web page vs a web site)
CBU Summerschool '07
Querying XML: Introduction 7
Presentation vs StructurePresentation vs Structure
Presentation informs the Presentation informs the human readerhuman reader about the about the meaning of text and the role of its partsmeaning of text and the role of its parts
Markup Markup indicates the presentation or the indicates the presentation or the meaning of different parts of text meaning of different parts of text
» originally hand-written annotations for the typesetter originally hand-written annotations for the typesetter
– nowadays primarily codes embedded in digital nowadays primarily codes embedded in digital documents; documents; <Tags><Tags>
CBU Summerschool '07
Querying XML: Introduction 8
Markup and Markup LanguageMarkup and Markup Language
Procedural markup Procedural markup – commands (start boldface, produce empty line, indent commands (start boldface, produce empty line, indent
5 mm, ...)5 mm, ...)– proprietary word processor formats, nroff, TeX, ...proprietary word processor formats, nroff, TeX, ...
– a fixed set of markup notations (e.g. nroff, TeX, HTML, a fixed set of markup notations (e.g. nroff, TeX, HTML, SVG, …) SVG, …)
CBU Summerschool '07
Querying XML: Introduction 9
Structure in DocumentsStructure in Documents
HierarchyHierarchy or or nestingnesting is ubiquitous is ubiquitous– Sections w. subsections etcSections w. subsections etc– (Also overlapping hierarchies!)(Also overlapping hierarchies!)
Linear orderLinear order essential in prose documents essential in prose documents– less important in documents representing data objectsless important in documents representing data objects
HypertextHypertext and and cross-referencescross-references
XML: proper hierarchies, tree-like structures, XML: proper hierarchies, tree-like structures, with cross-references via attribute valueswith cross-references via attribute values
CBU Summerschool '07
Querying XML: Introduction 10
1 Document Instances and Grammars1 Document Instances and Grammars
Overview of fundamentals, and some Overview of fundamentals, and some details, of XMLdetails, of XML
1.1 XML and XML documents1.1 XML and XML documents1.2 Basics of document grammars 1.2 Basics of document grammars 1.3 Basics of XML DTDs1.3 Basics of XML DTDs1.4 XML Namespaces1.4 XML Namespaces
1.5 XML Schema1.5 XML Schema
CBU Summerschool '07
Querying XML: Introduction 11
2.1 XML and XML documents2.1 XML and XML documents
XML - Extensible Markup Language,XML - Extensible Markup Language,W3C Recommendation, February 1998W3C Recommendation, February 1998– not an official standard, but a stable industry standardnot an official standard, but a stable industry standard– 22ndnd Ed 2000, 3 Ed 2000, 3rdrd Ed 2004, 4 Ed 2004, 4thth Ed 2006 Ed 2006
» editorial revisions, editorial revisions, notnot new versions of XML 1.0 new versions of XML 1.0
a simplified subset of SGML, Standard a simplified subset of SGML, Standard Generalized Markup Language, ISO 8879:1987Generalized Markup Language, ISO 8879:1987– validvalid XML documents are also SGML documents XML documents are also SGML documents
CBU Summerschool '07
Querying XML: Introduction 12
What is XML?What is XML?
ExtensibleExtensible Markup Language Markup Language is is notnot a markup a markup language! language! – does not fix a tag set nor its semantics does not fix a tag set nor its semantics
(like markup languages like HTML do)(like markup languages like HTML do)
XML documents have XML documents have no inherentno inherent (processing or (processing or presentation) presentation) semanticssemantics– even though many think that XML is semantic or self-even though many think that XML is semantic or self-
describing; See nextdescribing; See next
CBU Summerschool '07
Querying XML: Introduction 13
Semantics of XML MarkupSemantics of XML Markup
Meaning of this XML fragment?Meaning of this XML fragment?
– The application has to “understand” the tagsThe application has to “understand” the tags– But better off with the tags, though!But better off with the tags, though!
CBU Summerschool '07
Querying XML: Introduction 14
What is XML (2)?What is XML (2)?
XML XML isis– a way to use markup to represent informationa way to use markup to represent information– a a metalanguagemetalanguage
» supports definition of specific markup languages through XML supports definition of specific markup languages through XML DTDs (Document Type Definitions) or SchemasDTDs (Document Type Definitions) or Schemas
» E.g. XHTML a reformulation of HTML using XMLE.g. XHTML a reformulation of HTML using XML
Often “XML” Often “XML” XML + XML technology XML + XML technology
Essential Features of XMLEssential Features of XML
Overview of XML essentialsOverview of XML essentials– many details skippedmany details skipped– Learn to consult original sources Learn to consult original sources
(specifications, documentation etc) for details!(specifications, documentation etc) for details!» The XML specification is easy to browseThe XML specification is easy to browse
First of all, XML is a textual or character-based First of all, XML is a textual or character-based way to represent dataway to represent data
CBU Summerschool '07
Querying XML: Introduction 17
XML Document CharactersXML Document Characters
XML documents are made of ISO-10646 (32-bit) XML documents are made of ISO-10646 (32-bit) characterscharacters; in practice of their 16-bit Unicode ; in practice of their 16-bit Unicode subset (used, e.g., in Java)subset (used, e.g., in Java)– Unicode 2.0 defines almost 39,000 distinct charactersUnicode 2.0 defines almost 39,000 distinct characters
Characters have three different aspectsCharacters have three different aspects::– their identification as numeric code pointstheir identification as numeric code points– their their representationrepresentation by bytes by bytes– theirtheir visual presentation visual presentation
CBU Summerschool '07
Querying XML: Introduction 18
External Aspects of CharactersExternal Aspects of Characters
Documents are stored/transmitted as a sequence Documents are stored/transmitted as a sequence of bytes (of 8 bits). An of bytes (of 8 bits). An encodingencoding determines how determines how characters are characters are representedrepresented by bytes. by bytes.– UTF-8 (UTF-8 (7-bit ASCII) is the XML default encoding7-bit ASCII) is the XML default encoding– encoding="KOI8R"encoding="KOI8R" should be OK for Cyrillic textsshould be OK for Cyrillic texts
» (I cannot comment on parser support)(I cannot comment on parser support)
A A fontfont determines the determines the visual presentationvisual presentation of of characterscharacters
CBU Summerschool '07
Querying XML: Introduction 19
XML Encoding of Structure 1XML Encoding of Structure 1
XML is, essentially, a textual encoding scheme of XML is, essentially, a textual encoding scheme of labelledlabelled, , orderedordered and and attributedattributed treestrees::– internal nodes are internal nodes are elementselements labelled by type names labelled by type names– leaves are leaves are text nodestext nodes labelled by string values, or labelled by string values, or
empty element nodesempty element nodes– the left-to-right order of children of a node mattersthe left-to-right order of children of a node matters– element nodes may carry element nodes may carry attributesattributes
This view is shared by many XML techniques This view is shared by many XML techniques (DOM, (DOM, XPathXPath, XSLT, , XSLT, XQueryXQuery, ...), ...)
CBU Summerschool '07
Querying XML: Introduction 20
XML Encoding of Structure 2XML Encoding of Structure 2
XML encoding of a treeXML encoding of a tree– corresponds to a pre-order walkcorresponds to a pre-order walk– start of an element node with type name A start of an element node with type name A
denoted by a denoted by a start tagstart tag <A>, and its end <A>, and its end denoted by denoted by end tagend tag </A> </A>
– possible attributes written within the start tag:possible attributes written within the start tag:<A attr<A attr11=“value=“value11” … attr” … attrnn=“value=“valuenn”>”>
» Names attrNames attr11,…,attr,…,attrn n must be distinctmust be distinct
– text nodes written as their string valuetext nodes written as their string value
CBU Summerschool '07
Querying XML: Introduction 21
XML Encoding of Structure: XML Encoding of Structure: ExampleExample
AttributesAttributes– name-value pairs attached to elementsname-value pairs attached to elements– in start-tag after the element type namein start-tag after the element type name
– forms forms ""......"" and and ''......'' are interchangeable are interchangeable Also:Also:
– <!--<!-- commentscomments outside other markup outside other markup -->-->– <?note <?note this would be passed to the application as a this would be passed to the application as a
processing instruction named ‘note’processing instruction named ‘note’?>?>
CBU Summerschool '07
Querying XML: Introduction 24
CDATA SectionsCDATA Sections
““CDATA Sections” to include XML markup CDATA Sections” to include XML markup characters as textual contentcharacters as textual content
<![CDATA[<![CDATA[ Here we can easily include markup Here we can easily include markup characters and, for example, code characters and, for example, code fragments:fragments:
Two levels of correctness (1)Two levels of correctness (1)
Well-formedWell-formed documents documents – roughly: follows the syntax of XML,roughly: follows the syntax of XML,
markup correct (elements properly nested, tag markup correct (elements properly nested, tag names match, attributes of an element have names match, attributes of an element have unique names, ...)unique names, ...)
– violation is a fatal errorviolation is a fatal error ValidValid documentsdocuments
– (in addition to being well-formed) (in addition to being well-formed) obey an associated grammar (DTD/Schema)obey an associated grammar (DTD/Schema)
CBU Summerschool '07
Querying XML: Introduction 26
XML docs and valid XML docsXML docs and valid XML docs
XML documents = well-formed XML documentsXML documents = well-formed XML documents
An XML Processor (Parser)An XML Processor (Parser)
Reads XML documents and reports their contents Reads XML documents and reports their contents to an application to an application – relieves the application from details of markup relieves the application from details of markup – XML Recommendation specifies: XML Recommendation specifies: – recognition of characters as markup or data; what recognition of characters as markup or data; what
information to pass to applications; information to pass to applications; how to check the correctness of documents; how to check the correctness of documents;
– validation based on comparing document against its validation based on comparing document against its grammar grammar
Next: Basics of document grammarsNext: Basics of document grammars
CBU Summerschool '07
Querying XML: Introduction 28
1.2 Basics of document grammars1.2 Basics of document grammars
DTDs are variations of DTDs are variations of context-free grammarscontext-free grammars (CFGs), which are widely used to syntax (CFGs), which are widely used to syntax specification (programming languages, XML, …) specification (programming languages, XML, …) and to parser/compiler generation (e.g. and to parser/compiler generation (e.g. YACC/GNU Bison)YACC/GNU Bison)– No knowledge of them is necessary, but connections No knowledge of them is necessary, but connections
with CFGs may be informative for those that know about with CFGs may be informative for those that know about themthem
Aho Aho Hopcroft Hopcroft Ullman Ullman The Design and Analysis ...The Design and Analysis ...
Ref Ref Author* Title PublData Author* Title PublData P, P,Author Author Author Title PublData Author Author Author Title PublData L( L(Author* Title PublDataAuthor* Title PublData))
CBU Summerschool '07
Querying XML: Introduction 31
Extended ProductionsExtended Productions
Notice the Notice the regular expressionsregular expressions in in productionsproductions– to describe (potentially infinite) sequencesto describe (potentially infinite) sequences
That is, we are using That is, we are using extendedextended CFGs CFGs– content models (of a DTD) correspond to content models (of a DTD) correspond to
regular expressions (in an ECFG production)regular expressions (in an ECFG production)– > number of element’s children generally > number of element’s children generally
unlimited unlimited
CBU Summerschool '07
Querying XML: Introduction 32
1.3 Basics of XML DTDs1.3 Basics of XML DTDs
A A Document Type DeclarationDocument Type Declaration provides a provides a grammar (grammar (document type definitiondocument type definition,, DTD DTD) for a ) for a class of documents [Defined in XML Rec]class of documents [Defined in XML Rec]
Syntax (in the prolog of a document instance):Syntax (in the prolog of a document instance):<!DOCTYPE<!DOCTYPE rootElemType rootElemType SYSTEMSYSTEM "ex.dtd" "ex.dtd"<!-- <!-- "external subset" in file ex.dtd"external subset" in file ex.dtd --> --> [[ <!–- <!–- an optional "internal subset" an optional "internal subset" --> --> ]]>>
DTD = union of the external and internal subsetDTD = union of the external and internal subset– internal has preference for attribute and entity declsinternal has preference for attribute and entity decls
CBU Summerschool '07
Querying XML: Introduction 33
Markup DeclarationsMarkup Declarations
DTD consists of DTD consists of markup declarationsmarkup declarations – element type declarationselement type declarations
» ≈≈ productions of ECFGsproductions of ECFGs
– attribute-list declarationsattribute-list declarations » for declared element typesfor declared element types
– entity declarationsentity declarations» for physical structuresfor physical structures
– notation declarationsnotation declarations
logical structureslogical structures
CBU Summerschool '07
Querying XML: Introduction 34
How do Declarations Look Like?How do Declarations Look Like?
Element Type DeclarationsElement Type Declarations
General form:General form:<!ELEMENT<!ELEMENT elementTypeName elementTypeName ((EE)>)>
where where EE is a is a content modelcontent model regular expression of element namesregular expression of element names Content model operators:Content model operators:
E | F : choiceE | F : choice EE,, F: concatenation F: concatenationE? : optionalE? : optional E* : zero or moreE* : zero or moreE+ : one or moreE+ : one or more (E) : grouping(E) : grouping
Must groupMust group: : (A,B)|C or A,(B|C), but A,B|C forbidden(A,B)|C or A,(B|C), but A,B|C forbidden
Can declare attributes for elements:Can declare attributes for elements:– Name, data type and possible default value Name, data type and possible default value
Example:Example:<!ATTLIST FIG<!ATTLIST FIG
idid IDID #IMPLIED#IMPLIEDdescr CDATA #REQUIREDdescr CDATA #REQUIREDclass (a | b | c) class (a | b | c) "a">"a">
Semantics mainly up to the applicationSemantics mainly up to the application– processor checks that processor checks that IDID attributes are unique and that attributes are unique and that
targets of targets of IDREFIDREF attributes exist attributes exist
CBU Summerschool '07
Querying XML: Introduction 37
Mixed, Empty and Arbitrary ContentMixed, Empty and Arbitrary Content
Mixed contentMixed content::<!ELEMENT P<!ELEMENT P (#PCDATA | I | IMG)*>(#PCDATA | I | IMG)*>
– may contain text and elementsmay contain text and elements Empty contentEmpty content::
Named storage units of XML documentsNamed storage units of XML documents Multiple uses:Multiple uses:
– character entitiescharacter entities: : » << << and and << all expand to ‘ all expand to ‘<<‘‘
(treated as data, not as start-of-markup)(treated as data, not as start-of-markup)
» other other predefined entitiespredefined entities: : & > ' "e;& > ' "e;expand toexpand to &&,, > >,, ' ' andand ""
– general entitiesgeneral entities are shorthand notations: are shorthand notations:<!ENTITY UKU "University of Kuopio"><!ENTITY UKU "University of Kuopio">
CBU Summerschool '07
Querying XML: Introduction 39
Entities (2)Entities (2)
physical storage units comprising a documentphysical storage units comprising a document– parsed entitiesparsed entities
<!ENTITY chap1 SYSTEM <!ENTITY chap1 SYSTEM "http://myweb/ch1">"http://myweb/ch1">
– document entity document entity is the starting point of processingis the starting point of processing– entities and elements must nest properly:entities and elements must nest properly:
Unparsed Entities and Parameter EntitiesUnparsed Entities and Parameter Entities
Unparsed entitiesUnparsed entities allow XML documents refer to allow XML documents refer to external binary objects like graphics files external binary objects like graphics files – XML processor handles only textXML processor handles only text– I've rarely used theseI've rarely used these
Parameter entitiesParameter entities are used in DTDs are used in DTDs– useful for modularizing declarationsuseful for modularizing declarations
We skip theseWe skip these
CBU Summerschool '07
Querying XML: Introduction 41
1.4 XML Namespaces1.4 XML Namespaces
Documents often comprise parts processed by different Documents often comprise parts processed by different applications (and/or defined by different grammars) applications (and/or defined by different grammars)
– for example, in XSLT scripts:for example, in XSLT scripts:
– How to manage multiple sets of names?How to manage multiple sets of names?
HTML HTML elementselements
XSLT XSLT elements/elements/
instructionsinstructions
CBU Summerschool '07
Querying XML: Introduction 42
XML Namespaces (2/5) XML Namespaces (2/5)
Solution: Solution: – By introducing (arbitrary) local name By introducing (arbitrary) local name prefixesprefixes, ,
and binding them to (fixed) globally unique URIsand binding them to (fixed) globally unique URIs– For example, the local prefix “For example, the local prefix “xsl:xsl:” ”
conventionally used in XSLT scriptsconventionally used in XSLT scripts
CBU Summerschool '07
Querying XML: Introduction 43
XML Namespaces briefly (3/5)XML Namespaces briefly (3/5)
Namespace identified by a URI (through Namespace identified by a URI (through the associated local prexif) the associated local prexif) e.g.e.g. http://www.w3.org/http://www.w3.org/1999/XSL/Transform1999/XSL/Transform for XSLTfor XSLT
– conventional but not required to use URLsconventional but not required to use URLs– the identifier has to be unique, but no need to be an the identifier has to be unique, but no need to be an
addressaddress
Association inherited to sub-elementsAssociation inherited to sub-elements– see the next example (of an XSLT script)see the next example (of an XSLT script)
<!-- XHTML is the ’default namespace’ --><!-- XHTML is the ’default namespace’ --><xsl:template match="doc/title"> <xsl:template match="doc/title"> <H1><H1>
XML Namespaces briefly (5/5)XML Namespaces briefly (5/5)
Mechanism built on top of basic XMLMechanism built on top of basic XML– overloads attribute syntax (overloads attribute syntax (xmlns:xmlns:) to introduce ) to introduce
namespacesnamespaces– does not affect validation does not affect validation
» namespace attributes have to be declared for DTD-namespace attributes have to be declared for DTD-validityvalidity
» all element type names have to be declared (with their all element type names have to be declared (with their initial prefixes!)initial prefixes!)
– > Other schema languages (XML Schema, Relax NG) > Other schema languages (XML Schema, Relax NG) better for validating documents with Namespacesbetter for validating documents with Namespaces
CBU Summerschool '07
Querying XML: Introduction 46
1.5 XML Schemas1.5 XML Schemas
A quick look at XML SchemaA quick look at XML Schema– W3C Recommendation,W3C Recommendation,
11stst Ed. May, 2001; 2 Ed. May, 2001; 2ndnd Ed. Oct, 2004: Ed. Oct, 2004:» XML Schema Part 0: Primer (readable non-XML Schema Part 0: Primer (readable non-
» XML Schema Part 1: StructuresXML Schema Part 1: Structures
» XML Schema Part 2: DatatypesXML Schema Part 2: Datatypes
– W3C Draft (didn't lead anywhere?):W3C Draft (didn't lead anywhere?):» Formal Description, 9/2001 Formal Description, 9/2001
CBU Summerschool '07
Querying XML: Introduction 47
Advantages of XML Schema Advantages of XML Schema (1)(1)
XML syntaxXML syntax– easier to manipulate by programs (than DTDs)easier to manipulate by programs (than DTDs)
Compatibility with namespacesCompatibility with namespaces– can validate against declarations from multiple can validate against declarations from multiple
sourcessources Content datatypesContent datatypes
– 44 built-in datatypes (including primitive Java 44 built-in datatypes (including primitive Java datatypes, datatypes of SQL, and XML attribute datatypes, datatypes of SQL, and XML attribute types)types)
– mechanisms to derive user-defined datatypesmechanisms to derive user-defined datatypes– used as types of XQueryused as types of XQuery
CBU Summerschool '07
Querying XML: Introduction 48
XSDL built-in types XSDL built-in types
(Part 2, Chap. 3)(Part 2, Chap. 3)
NB: all simple values in NB: all simple values in documents documents stringsstrings
**CDATACDATA
**
**
**
**
**
**
**
**
*: XML attribute *: XML attribute typestypes
CBU Summerschool '07
Querying XML: Introduction 49
Advantages of XML Schema Advantages of XML Schema (2)(2)
Element names and Element names and content typescontent types independent; Compare with independent; Compare with – For example, could define For example, could define titlestitles
» of people as “Mr.”/”Mrs.”/”Ms.”, andof people as “Mr.”/”Mrs.”/”Ms.”, and» of chapters as stringsof chapters as strings
– > extend the power of CFGs/DTDs > extend the power of CFGs/DTDs » where non-terminal / tag-name alone determines where non-terminal / tag-name alone determines
its allowed content its allowed content
– (Is this relevant in practice?) (Is this relevant in practice?)
CBU Summerschool '07
Querying XML: Introduction 50
Advantages of XML Schema Advantages of XML Schema (3)(3)
Ability to specify uniqueness and keys within Ability to specify uniqueness and keys within selected parts of the documentselected parts of the document– for example, that for example, that titletitless of chapters should be unique; or of chapters should be unique; or
key attributes of relationskey attributes of relations– uses XPathuses XPath
Support for schema documentation Support for schema documentation – element element annotationannotation with sub-elements with sub-elements
documentationdocumentation (for human readers) and(for human readers) andappInfoappInfo (for applications)(for applications)
– Only these contain text (#PCDATA)Only these contain text (#PCDATA)
CBU Summerschool '07
Querying XML: Introduction 51
Disadvantages of XML Disadvantages of XML SchemaSchema
Complexity (esp. Rec Part 1!) vs. added power Complexity (esp. Rec Part 1!) vs. added power – > a long learning curve> a long learning curve– > slow adoption by users> slow adoption by users
Immaturity of implementations (?)Immaturity of implementations (?)– W3C web site mentions ~ 60 tools/processorsW3C web site mentions ~ 60 tools/processors– Apache Xerces claims full XSDL supportApache Xerces claims full XSDL support– Some features difficult to implement efficientlySome features difficult to implement efficiently
Alternative schema languages have been suggested, Alternative schema languages have been suggested, tootoo– Relax NGRelax NG– SchematronSchematron– ... ...
CBU Summerschool '07
Querying XML: Introduction 52
XSDL through ExampleXSDL through Example
– Next: walk-through of an XML schema exampleNext: walk-through of an XML schema example– from Chapter 2 of the XML Schema Primerfrom Chapter 2 of the XML Schema Primer
– Consider modelling purchase orders like below:Consider modelling purchase orders like below:
The Purchase Order Schema (5/5)The Purchase Order Schema (5/5)
<!-- Type for Stock Keeping Units, <!-- Type for Stock Keeping Units, (codes for identifying products): --> (codes for identifying products): -->
<xs:simpleType name="SKU"> <xs:simpleType name="SKU"> <xs:restriction base="xs:string"><xs:restriction base="xs:string"><!-- defined by a regular expr: --> <!-- defined by a regular expr: --> <xs:pattern value="\d{3}-[A-Z]{2}" /> <xs:pattern value="\d{3}-[A-Z]{2}" />
XSDL: an XML-based grammar XSDL: an XML-based grammar formalismformalism– W3C Recommendation; Alternative to DTDsW3C Recommendation; Alternative to DTDs
» support for namespacessupport for namespaces» richer content and attribute datatypesricher content and attribute datatypes
Well accepted(?) in XML industryWell accepted(?) in XML industry– e.g., to describe messages btw clients and servers e.g., to describe messages btw clients and servers
in in Web servicesWeb services; (See, e.g., Web Services ; (See, e.g., Web Services Description Language, Vers. 2.0, W3C Draft 3/07)Description Language, Vers. 2.0, W3C Draft 3/07)