M3 XML Processing - Sepp Hochreiter · zXML Data Binding – Non-Generic Mapping {JAXB 2.0 – Java Architecture for XML Binding {SDO – Service Data Objects (J2EE platform) {ADO

Modul 3:

XML Processing

a.Univ.-Prof. Dr. Werner Retschitzegger

Vorlesu

ng

IFS in der B

ioinformatik

SS 2011

Johannes Kepler University Linzwww.jku.ac.at

Johannes Kepler University Linzwww.jku.ac.at

Institute of Bioinformaticswww.bioinf.jku.at

Institute of Bioinformaticswww.bioinf.jku.at

IFSIFSInformation Systems Group

www.ifs.uni-linz.ac.at

IFSIFSIFSIFSInformation Systems Group

www.ifs.uni-linz.ac.at

M3-2

XML ProcessingXML & DBXQueryXPathIntroduction

© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

Outline

IntroductionMotivationXML Processing Alternatives – OverviewExtensions of Existing Languages Interfaces to Existing LanguagesNative XML Processing

XPathXQueryXML & DB

The following slides are based (among others) on:Kay, Michael: XPath 2.0 Programmer's Reference (3rd ed.), Wiley, Aug. 2004.Walmsley, Priscilla, XQuery, OReilly, March 2007.Klettke, Meike, Meyer, Holger: XML & Datenbanken, dpunkt.verlag, Jan. 2003.

M3-3



Motivation

Huge amount of XML data, steadily growing

We need to “process” it, including its “storage”Filter, search, select, join, aggregateCreate new pieces of informationClean, normalize the data Update itVerify the correctness Take actions based on the existing dataWrite complex execution flowsStore it efficiently

No common architecture like for RDBS Applications are too heterogeneous

M3-4



XML Processing Alternatives – Overview

(1) Existing Language ExtensionsProcedural

JavaScript (ECMA), AJAX, PHP

DeclarativeSQL/XML – part of the SQL:2003-Standard

(2) Interfaces to Existing LanguagesXML APIs – Generic Mapping

DOM, SAX, StaX

XML Data Binding – Non-Generic MappingJAXB 2.0 – Java Architecture for XML BindingSDO – Service Data Objects (J2EE platform)ADO – ActiveX data objects (.NET platform)EMF – Eclipse Modeling Framework

(3) Native XML ProcessingPure XML Type System

XPath, XSLT and XQuery

UE IFS2

VO IFS2

VO IFS2

VO/UEModel Engineering

M3-5



(1) Extensions to Existing Languages

Extension of the type system of existing languages with XML typesImport of XML data into this type system

Extension of the APIXML retrieval and manipulation XPath-based or XPath inspired

Example: SQL/XML

SELECT e.employee_id, XMLElement("Emp",

e.first_name||' '||e.last_name) AS resultFROM employees eWHERE employee_id > 200;

EMPLOYEE_ID FIRST_NAME

EMPLOYEES

LAST_NAME

EMPLOYEE_ID RESULT----------- ----------------------------

201 <Emp>Michael Hartstein</Emp>202 <Emp>Pat Fay</Emp>203 <Emp>Susan Mavris</Emp>

SELECT e.resume.extract('//JOB_ID/text()') result

FROM emp_resumes eWHERE e.employee_id = 100;

Relational Data XML Data

XML Data Relational Data<RESUME><FULL_NAME>S.King</FULL_NAME><JOB_HISTORY><JOB_ID>AD_PRES</JOB_ID></JOB_HISTORY>…

</RESUME>

RESUME

EMP_RESUMES

RESULT-------AD_PRES

M3-6



Mapping of XML data to genericXML programmatic APIsProgramming languages(e.g. Java, C#) are used to manipulate the dataRe-serialize it at the end

More details later on …

(2) Interfaces to Existing LanguagesXML API’s

<purchaseOrder><lineItem>…</lineItem><lineItem>…</lineItem>

</purchaseOrder>

<book><author>…</author><title>…</title>…

</book>

Class DomNode{public String getNodeName();public String getNodeValue();public void setNodeValue(nodeValue);public short getNodeType();

}

Generic Mappings

M3-7



(2) Interfaces to Existing LanguagesXML Data Binding

Non-Generic Mappings

Mapping of the XML Schemaof the XML data to appropriatecode in the target languageBased on this mapping, marshalling / unmarshallingbetween XML and objectsAdvantages

Abstraction from low-level APIs& the details of the parsing processDevelopment effort and error-proness can be reduced

DisadvantagesHigh memory demands forlarge XML documentsXML Schemaevolution leads to a new generation of thecorrsponding classes

<type name=“book-type”><sequence><attribute name=“year” type=“xs:integer”/><element name=“title” type=“xs:string”/><sequence minoccurs=“0”>

<element name=“author” type=“xs:string/></sequence>

</sequence></type>

<element name=“book” type=“book-type”>

Class Book-type{public integer getYear();public string getTitle();public List getAuthors();

}

DerivedClasses

andInterfaces

DerivedClasses

andInterfaces

Data Abstraction

Translation

Objects

Instances

Instances Deserialization(Unmarshalling)

Serialization(Marshalling)

Validation

Binding Compiler

Data Binding Framework getter/setter-methods

Customization of translation possible

XML Schema

XML Document

http://www.rpbourret.com/xml/XMLDataBinding.htm

M3-8



(3) Native XML Processing

Most promising alternative for the future!

The only alternative such that …the data is modeled only onceit is well integrated with the XML Schema type systemit preserves the logical/physical data independencethe code deals with non-generic structuresthe code can be optimized automatically

Data is storedin plain file systems or in dedicated data storese.g. XML extensions of RDBS

Missing pieces, under developmentprocedural logicupdate language…

M3-9

XML Processing


XQuery XML & DBXPathIntroduction

OutlineOutline

IntroductionXPath

IntroductionXPath 1.0XPath 2.0

XQueryXML & DB


M3-10

XML Processing



IntroductionOverview

PurposeOriginal goal: selecting document parts for layout purposes (XSL)Now used for various XML-standards – XML Schema, XPointerNo XML syntax used – proprietary syntaxVarious selection criteria, e.g., element/attribute names, content, type

Basic Processing PrincipleTree-based navigation, similar to navigation in a file system Starting point is always a certain context – i.e., a tree node specified by a XPath expressionNavigation and Filter modify the contextResult of a XPath expression = context computed in the last step

Read-only languageIt cannot create nodes or modify existing nodes, except by callingfunctions written in another languageHowever, it can create new atomic values and sequences of existing nodes

W3C-StandardsXPath 1.0, Nov. 1999, ~ 44 pagesXPath 2.0, Jan. 2007, ~ 250 pages

M3-11

XML Processing



XPath 1.0XPath Datamodel – 7 Node Types

Note: Root is NOT equal to the root (i.e. outermost) elementbut rather represents the whole XML document ("document entity“)

ProcessingInstruction

NodeStringValue: String

NodeWithChildren

Element Attribute Text CommentRoot

Namespace

declares*

*

NodeWithoutChildren

1

0..1isDefinedBy

parent

child*parent

child1

outermostelement

parent

child *

parent childchild

**

*

attribute

namespace

*

M3-12

XML Processing



no:Attribute

h1234

HandyCatalog:Element

name:Attribute

NOKIA

ProducerNo:Element Type:Element Type:Element

Weight:Element Price:Elementcontract:Attributeno

:Comment

NOKIA

Producer:Element

....

Price:Element

:Text

999:Text

4999:Text

141g

name:Attribute

8210

name:Attribute

7110

contract:Attributeyes

:root

Node Name: Node TypeNode Value

Legend:Root Node

: part-of

UML Object Diagram

XPath 1.0 XPath Data Model – Example HandyCatalog1.xml

Root (Outermost)Element

M3-13

XML Processing



XPath 1.0XPath Navigation – 13 Axes Names

self

ancestor-or-self

ancestor

parent

following-sibling

following

child

descendant

descendant-or-self

preceding

preceding-sibling

Context Node

Parts of a XML document represent nodes of a treeProcessing direction of the XPath-processor is depth-firstFurther axes names

attributenamespace

M3-14

XML Processing



Hierarchical Operators / and ///

root node

//Typeall Type elements at arbitrary depth//Type/Priceall Price childelements of Type elements at arbitrary depth

Access to Elements */*

root element//*

all elements, including the root element/HandyCatalog/*/Typeall Type elements, which are grandchildsof the HandyCatalog element

XPath 1.0Hierarchical Operators, Elements/Attributes

Access to Attributes @//@*all attributes

Weight

ProducerNono

Pricecontract

Typename

HandyCatalog

Producername

root

Hierarchical Operators / and ///

root node

//Typeall Type elements at arbitrary depth//Type/Priceall Price childelements of Type elements at arbitrary depth

Access to Elements */*

root element//*

all elements, including the root element/HandyCatalog/*/Typeall Type elements, which are grandchildsof the HandyCatalog element

M3-15

XML Processing



XPath 1.0Filter

Weight

ProducerNono

Pricecontract

Typename

HandyCatalog

Producername

root

//Type[Price]all Type elements containing a Price childelement

//Producer[ProducerNo]/Type[Price]all Type elements containing a Price childelement, whereby the Type elements must be childelements of a Producer element which contains a ProducerNo childelement

//Producer[Type/Price]all Producer elements containing a Type childelement which in turn contains a Price childelement

//Type[Weight and Price]all Type elements having Weight and Price childelements

//Type[Weight = "141g"]all Type elements containing a Weight childelement with value 141g

//Type[@name = "7110"]all Type elements containing an attribute name with value 7110

M3-16

XML Processing



XPath 1.0Union, Index-based Access, Variables

Weight

ProducerNono

Pricecontract

Typename

HandyCatalog

Producername

root

Union | //Type/Weight | //Type/Priceall Weight and Price childelements of Type elements

Index-based access via the node’s context position//Type[1]first Type elementType[last()]last Type element

Variable $qnamefrom within XPath 1.0, variables can be referenced onlythe variable $qname has to be definedby the application using XPath 1.0 (e.g., XSLT or XQuery)Note: XPath 2.0 can also bind values to variable („for-clause“)

M3-17

XML Processing



XPath 1.0 Path Expressions 1/2

Relative PathProcessing starts at the current context node (determined e.g., by the preceding Location Step)

Absolute Path Processing starts at the root node ("/") INDEPENDENT of the current context

Location Step

AxisName – Navigation via axes name (ancestor, etc.)Short forms for some axes nameschild:: element-name element-nameattribute::attname @attname/descendant-or-self::node()/ //self::node() .

parent::node() ..

AxisName::NodeTest('['predicate']')*

Location Step[/Location Step]*

/Path

Chaining

M3-18

XML Processing



XPath 1.0Path Expressions 2/2

::NodeTest – Node filtering (1)Name of the node, or Wildcard "*" – arbitrary elements, "@*" – arbitrary attributes, or Type of the node on basis of a function (text(), comment(), processing-instruction(), node())

Result = Set of Nodes

[predicate] – Node filtering (2)Is a Filter on all nodes selected by NodeTest – e.g., specification of the context position via the nodes’ numberMultiple predicates are processed from left2rightResult = Boolean ValuePredicates may again contain Location Paths

E.g., selection of a node, in case that certain elements/attributes exist in the context of this node//address[tel/@type="work"]

M3-19

XML Processing



XPath 1.0 Operators and Functions

XPath OperatorsNode Set Operators

|, [expr], /, //

Boolean and Comparison Operatorsor, and, =, !=, <=, <, >=, >

Arithmetic Operators+, -, *, div, mod

XPath Core Function Library ~ 37 functions availableNode Set Functions (7)

last(), position(), count(), id()(), local-name

String Functions (20)contains(string s1, string s2)concat(string s1, string s2, string sn*)

Boolean Functions (5)boolean true(), boolean false()

Number Functions (5)number round(number), number sum(node-set)

M3-20

XML Processing



XPath 2.0Goals of XPath 2.0

Simplify manipulation of XML Schema-typed contentIntroduction of a type system based on XML Schema

Simplify manipulation of string contentRegular expressions, changing strings to upper and lower case, etc.

Support related XML standardsSupports common underlying semantics for XSLT 2.0 and XQuery 1.0Data model based on the InfoSet W3C-Standard

Improve ease of useNew string / aggregation functions, conditional expression, etc.

Improve interoperabilityDifferent implementations of specifications should produce same result

Improve i18n supportSupport the needs of different languages and cultures worldwide

Maintain backward compatibilityLarge gratuitous incompatibilities were avoidedAbility to run in backward compatibility mode

Enable improved processor efficiency

M3-21

XML Processing



XPath 2.0XPath 2.0 vs. XPath 1.0

70% more language concepts than XPath 1.0

Number of operatorshas doubled

Number of functions in the standard function library has grown by a factor of four

Minor changes in core syntax

Introduction of a new type system based on XML Schemarepresents a pretty radical overhaul of the language semantics

M3-22

XML Processing



XPath 2.0New Features in XPath 2.0 – Overview

Everything is a „sequence“ and Sequence ProcessingConstruction operatorsFilterNew set operators in addition to UNIONFunctions for list manipulationAggregation functions

Support of XML Schema‘s Type SystemType annotationsTyped valuesType expressions

Changes to Path ExpressionsNode tests now also on basis of XML Schema TypesLocation steps can be now defined by function calls

New ExpressionsControl primitives: «for» and «if»Quantifiers: «some» and «every»

New Operators and New Functions

M3-23

XML Processing


XQuery XML & DBXPathIntroduction NOTE: Although syntactically correct, nested sequences become unnested

XPath 1.0: Sets of nodes onlyUnorderedCan‘t contain duplicates

SequencesAre ordered(1, 2, 3, 4) is different from (4, 3, 2, 1)

Can have duplicates(1, 2, 3, 4) is different from (1, 1, 2, 3, 4)

Can have heterogenous items(1, 2, 3, “foo“)

Can‘t be nested(1, 2, (3, 4)) is the same as (1, 2, 3, 4)

IdentityYES: NodesNONE: Atomic values and sequences1 is the same as (1)

XPath 2.0„Everything is a sequence“ 1/2

Item{abstract}

SequenceAtomic ValueNode

contains*

Remember Lisp ?

M3-24

XML Processing



XPath 2.0„Everything is a sequence“ 2/2

Consequence of „everything is a sequence“Every operand of an expression is a sequenceEvery result of an expression is a sequence

2 characteristics: closure and composabilityThe language is closed every possible operation applied to a sequence generates again a sequenceTherefore expressions can be nested arbitrarily –composability

ExampleSum(//Type/Price)

Result = Sequence

M3-25

XML Processing



XPath 2.0Sequence Processing 1/2

Union (alternative: | as in XPath 1.0)(A, B) union (A, B) (A, B) (A, B) union (B, C) (A, B, C)

Intersection(A, B) intersect (A, B) (A, B)(A, B) intersect (B, C) (B)

XPath 1.0 versus XPath 2.0Determine whether the node $x is included in the /foo/bar node-setXPath 1.0: count(/foo/bar)=count(/foo/bar | $x)XPath 2.0: $x intersect /foo/bar

Difference(A, B) except (A, B) ()(A, B) except (B, C) (A)

XPath 1.0 versus XPath 2.0Select all attributes except the one with a given NS-qualified nameXPath 1.0: @*[not(namespace-uri()='http://example.com' and local-name()='foo')]XPath 2.0: @* except @exc:foo

M3-26

XML Processing



XPath 2.0Sequence Processing 2/2

List functionsinsert((1, 3, 4), 2, 2) (1, 2, 3, 4)remove((1, 2, 3), 2) (1, 3)index-of((10, 20, 30), 20) 2empty(()) trueexists((1, 2, 3)) true

Aggregation functionssum(1, 2, 3) 6 //already supported in XPath 1.0count(1, 2, 3) 3 //already supported in XPath 1.0avg(1, 2, 3) 2min(1, 2, 3) 1max(1, 2, 3) 3

M3-27

XML Processing



XPath 2.0Type System

XPath 1.0 supports Node-setsBooleansStringsA single numeric data type (double precision floating point)

Weakly typed language

XPath 2.0 supportsSequences as a data typeAll 19 primitive simple types built into XML Schema like integers, decimals, single precision, dates, times, durations, …User-defined data typesStrong type checking as well as weak type checking

hybrid languagesatisfies data-oriented and document-oriented world

M3-28

XML Processing



XPath 2.0Type System – Changes to XPath 1.0 Data Model

NodeStringValue: String

1ProcessingInstruction

Text Comment

Namespace

declares*

* 0..1isDefinedBy

parent

child*

parent

child1

outermostelement

parent

child *

parent childchild

**

NodeWithChildren NodeWithoutChildren

* *

attribute

namespace*

*Document AttributeElement

TypedNodeName: QName?TypedValue: AtomicValue*TypeAnnotation: QName?

ComplexTypes SimpleTypes

has

0..1

XMLSchemaTypes has

0..1 AtomicValueTypeAnnotation

TypeAnnotation

TypeAnnotation

TypeAnnotation*

M3-29

XML Processing



XPath 2.0Path Expressions – Node Test by Schema Type

Node tests in XPath 1.0On basis of the node‘s name and it‘s predefined 7 types

Node tests in XPath 2.0Also on basis of the node‘s type defined by XML SchemaFor example, select all elements of type Person, regardless of the nameUseful especially when using a schema with a rich typehierarchy in which many elements can be derived from thesame type definition

M3-30

XML Processing



XPath 2.0Path Expressions – Function as Location Step

Now, a function call can be used as a location stepAllows to follow logical relationships in the document’s structure, not just physical relationships given by the hierarchyExample: «customer[@id="123"]/find-orders(.)/order-value»The person writing a path expression doesn’t necessarily need to know how the orders for a customer are found

supports some kind of information hiding encapsulationthe way that they are found can change without invalidating the expression locality of change

XPath itself does not allow to write the find-orders()function

you can do this on basis of XQuery or XSLT

M3-31

XML Processing



XPath 2.0«for» Expression

Enables iteration over sequences, returning a new valuefor each member in the argument sequence

for $line in /po:PurchaseOrder/po:OrderLines/po:Linereturn $line/po:Price * $line/po:Quantity

Similar to xsl:for-each, but it is different in that it is an actual expression, that returns a sequence which can, in turn, be processed as such

fn:sum(for $line in /po:PurchaseOrder/po:OrderLines/po:Linereturn $line/po:Price * $line/po:Quantity

)

PurchaseOrder

OrderLines

Line

Price Quantity

Seller

Code

M3-32

XML Processing



XPath 2.0«if» Expression

Depending on whether the expression in parenthesisevaluates to true or false, the expression returns thethen or else section

if(/po:PurchaseOrder/po:Seller = 'Bookstore') then 'ok' else 'ko'

Power of XPath 2.0 comes from the ability to combineexpressions to create sophisticated requests

fn:sum(for $line in /po:PurchaseOrder/po:OrderLines/po:Linereturnif($line/po:Code) then $line/po:Price * $line/po:Quantityelse ()

)

PurchaseOrder

OrderLines

Line

Price Quantity

Seller

Code

M3-33

XML Processing



XPath 2.0Existential «some» and Universal «every» Quantifiers

XPath 1.0 equals operator (=) could compare node-sets/students/student/name = "Fred" returns true if anystudent name is equal to "Fred" existential quantificationThe same applies to !=, <, >,…;

e.g. /students/student/name != "Fred" returns true if anystudent name is not equal to "Fred"

XPath 2.0 makes it possible to write explicit quantifiedexpressions – existentially and universially quantified

some $x in /students/student/name satisfies $x = "Fred"every $x in /students/student/name satisfies $x = "Fred"

This formulation is more powerful, because the constrainingcondition can be anything (not just =, !=, < and so on)

some $item in //LineItemsatisfies (($item/Price * $item/Quantity) > 100)some $x in (1, 2, 3), $y in (2, 3, 4) satisfies $x + $y = 4

M3-34

XML Processing



XPath 2.0String Support Improved

Case conversionupper-case('Michael') 'MICHAEL‚

String concatenationconcat(‘Jane‘, ‘ ‘, ‘Brown‘) ‘Jane Brown‘

Complementing the starts-with()function of XPath 1.0 ends-with() function

Regular expressions supported by 3 functionsmatches(), replace(), and tokenize()Example: matches(SSNumber, '\d{3}-\d{2}-\d{4}')

All functions that perform comparison of strings can now use a user-specified collation to do the string comparison

This allows more intelligent localization of string matchingaccording to the conventions of different languages

M3-35

XML Processing



XPath 2.0XPath Functions by Category 1/2

Boolean Functionsboolean(), false(), true()

Numeric Functionsabs(), avg(), max(), min()

String Functionscompare(), concat(), contains()

Date and Time Functionscurrent-date(), current-time()

Duration Functionsdays-from-dayTimeDuration(), hours-from-dayTimeDuration()

Aggregation Functionscount(), avg(), count(), max(), min(), sum()

Functions on URIsbase-uri(), collection(), doc()

Functions on QNamesexpanded-QName(), local-name-from-QName()

M3-36

XML Processing



XPath 2.0XPath Functions by Category 2/2

Functions on Sequencesempty(), exists()

Functions that Return Properties of Nodesbase-uri(), data(), document-uri()

Functions that Find Nodescollection(), doc(), id(), root()

Functions that Return Context Informationbase-uri(), collection(), current-date()

Diagnostic Functionserror(), trace()

Functions that Assert a Static Typeexactly-one(), one-or-many(), zero-or-one()

M3-37

XML Processing


XML & DBXQueryXPathIntroduction

Outline

IntroductionXPathXQuery

IntroductionFor and let clausesAdding Elements/Attributes to ResultsConditional ExpressionsJoinsQuantifiersDistinctness & GroupingSorting & AggregatingStructure of a XQuery ProgramAppendix

XML & DB The following slides are based (among others) on:

Kay, Michael: XPath 2.0 Programmer's Reference (3rd ed.), Wiley, Aug. 2004.Walmsley, Priscilla, XQuery, OReilly, March 2007.Klettke, Meike, Meyer, Holger: XML & Datenbanken, dpunkt.verlag, Jan. 2003.

M3-38

XML Processing



IntroductionWhy XQuery?

Why a “query” language for XML?Preserve logical/physical data independence

Based on an abstract data model, independent of physical data storage

Declarative programmingDescribe the “what”, not the “how”Commonalities with functional, imperative and query languages

Why a native query language? Why not SQL?We need to deal with the pecularities of XMLHierarchical, ordered, textual, potentially schema-less structure

Why another XML processing language ? Why not XSLT?The template nature of XSLT was not appealing to DB peopleNot declarative enough

Transacted data

Declarative processing

XQuery

Persistent data

Transacted data

Declarative processing

Persistent data

SQL

M3-39

XML Processing



XQuery 1.0

IntroductionXPath – XSLT – XQuery

XPath 1.0

XPath 2.0

XSLT 1.0

XSLT 2.0 uses

uses

Common Data Model

Common Data Model XML Schemauses

1999

2007

provides

Library ofFunctions &Operators

extends

XM

L-b

ase

dS

yn

tax

No

n-X

ML-b

ase

dS

yn

tax

M3-40

XML Processing



IntroductionXPath – XSLT – XQuery

XPath 2.0Common language fornavigation, selection, extractionUsed in XSLT, XQuery, XPointer, XML Schema, XForms, etc.

XSLT 2.0: XML ⇒ XML, HTML, TextLoosely-typed scripting languageFormat XML in HTML for display in browserMust be highly tolerant of variability/errors in data

XQuery 1.0: XML ⇒ XMLStrongly-typed query language – enforces input and output typesMust guarantee safety/correctness of operations on data – side-effect freeLarge-scale database access

M3-41

XML Processing



Quilt

SQL OQL

XML-QL

XQL

XQL-99

XSLXPointer

XPath

Navigation,path expressions

Variabel bindings,flexible structuringof the result

Expressions

IntroductionHistory

XQuery

Main basis for XQuery was “Quilt”XML query language from IBM, INRIA and Software AG

M3-42

XML Processing



IntroductionXQuery Family of Standards

W3C-REC Jan. 2007XQuery 1.0 and XPath 2.0 Functions and Operators

the functions you can call in XPath expressions and the operations you can perform on XPath 2.0 data types

XQuery 1.0 and XPath 2.0 Data Model (XDM)representation and access for both XML and non-XML sources

XSLT 2.0 and XQuery 1.0 Serializationhow to output the results of XSLT 2.0 and XML Query evaluation in XML, HTML or as text

XML Syntax for XQuery 1.0 (XQueryX)an XML-aware syntax for querying collections of structured and semi-structured data both locally and over the Web

XQuery 1.0 and XPath 2.0 Formal Semanticsthe type system used in XQuery and XSLT 2.0 via XPath defined precisely for implementers

W3C Working Drafts / Java Community ProcessXQuery Update – Candidate Recommendation since August 2008!XQuery and XPath Full Text SearchXQJ – Query API for Java (~ JDBC)

http://www.w3.org/TR/xquery/

M3-43

XML Processing



IntroductionXQuery = 80% XPath 2.0 + 20% …

FLWOR (for-let-where-order-return)-expressions~ SQL’s SELECT-FROM-WHERE

XML constructionAdding new elements and attributes as well as transformations

Sorting of the resultOperators on types

Compile & run-time type tests

User-defined functionsModularize large queriesProcess recursive data

Strong typingGuarantees result value conforms to output typeEnforced statically or dynamically

M3-44

XML Processing



IntroductionFLWOR ['floωer] Expression 1/2

Variables are bound to values of expressions (using XPath)

Filtering of tuples on basis of predicates(optional)

Composition of the result (single nodes,ordered forest of nodes or atomic value)

Result = Instance of XPath/XQuery Data Model

RETURN

Filtered list of tupels frombound variables

WHERE

Ordered list of tupels frombound variables

FOR/LET

XML-Document

Construction (cf. SELECT in SQL)

Selection (cf. WHERE in SQL)

Iteration (cf. FROM in SQL) and Var. Binding

ORDER Ordering of tuples on basis of predicates(optional)

Ordering (cf. ORDERBY in SQL)

Ordered list of tupels

M3-45

XML Processing



IntroductionFLWOR ['floωer] Expression 2/2

LET $var := expr

FOR $var IN expr RETURN expr

WHERE expr ORDER expr

,

FLWOR ExpressionsAllow sortingAllow joiningAllow adding elements/ attributes to resultsVerbose, but can beclearer

FunctionCall

XPathExpression

Variable Binding

Variable Reference

Path ExpressionsGreat if just copyingcertain elements and attributes as is

M3-46

XML Processing



IntroductionXQuery Syntax – Some Important Issues

Nested ExpressionsCompact, non-XML syntaxBUT all names must be valid XML names

variables, functions, elements, etc.can be associated with a NS

No reserved wordsCase-sensitive

keywords are written as lowercaseNo special end-of-line characterXQuery comments are delimited by (: and :)

anywhere (insignificant) whitespace is alloweddo not appear in the resultexpansion over multiple lines allowed

Whitespacesallowed almost anywhere – have no significance

M3-47

XML Processing



IntroductionThe XQuery Processing Model

Source Tree

Analysis and Evaluation(XQuery Processor)

XQueryQuery

XM

L P

roce

ssor

Result Tree

Serialize orpass onSource

Document(XML)

ResultDocument

(XML)

M3-48

XML Processing



Running Example

0..1

PriceListeffDate

Prodnum

Pricecurrency

Discounttype

Prices

1

1..*

1

Ordernumdatecust

Itemdeptnumquantitycolor

*

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1

Productdept

Catalog.xmlPrices.xml

Order.xml

Text1

Text1

M3-49

XML Processing



Using a let clause with a range expression

Using a range expression in a for clause

Multiple for clauses

Multiple variable bindings in one for clause

for/let and Enclosed Expressions

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1

Productdept

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1

ProductdeptProductdept

M3-50

XML Processing



Adding Elements/Attributes to ResultsThree Use Cases

(1) 1:1 copying of elements/attributes from the input documentSimple elementsComplex elements – along with their attributes and children if any (notjust their atomic values!)No opportunity to change attributes, children, etc.

(2) Direct element/attribute constructors – a mixture of ...Literal content („hard-coded“) – appears as is in the output documentExpressions within „{}“ evaluating to any kind of node (elements, attributes, etc.) and to atomic valuesUsing XML syntax (proper nesting, case sensitivity, etc.)

(3) Computed constructorsAllows for dynamic names of nodes and dynamic valuesCopying tags from the input document but making minor changes(e.g., add an attribute)Turning content from the input document into markup

M3-51

XML Processing



Copy simple elements – name

Copy complex elements – product

Adding Elements/Attributes to Results(1) 1:1 Copying from the Input Document

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1

Productdept

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1


M3-52

XML Processing



Wrap whole result (name elements) in new ul elements

In addition, wrap each resulting name element in an li element

Adding Elements/Attributes to Results(2) Direct Constructors 1/3

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1

Productdept

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1


Literalcontent

Literalcontent

M3-53

XML Processing




Add new attributes, copy attribute values / elementcontent

Copy element content and use as attribute valueswith prefix „P“

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1

Productdept

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1


New attribute name & new value

New attribute name & copy existing value

data-()function not necessary –automatic „atomization“

Copy element content (or attribute content)(its typed value) via data()-function

M3-54

XML Processing




Copy attributes/elements & eliminate certain elements

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1

Productdept

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1


Copy dept-attributesto new element new_product

Copy product elements andadd as subelements to new_product

Eliminate the numbersubelements of product

M3-55

XML Processing



Turning content into markupAttribute values elementsExplicit element constructor

Adding Elements/Attributes to Results(3) Computed Constructors

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1

Productdept

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1


M3-56

XML Processing



Conditional Expressions

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1

Productdept

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1


M3-57

XML Processing



Joins 1/2

Two-way join in a predicate

Two-way join in a where clause

Ordernumdatecust

Ordernumdatecust



*

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1

Productdept

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1


M3-58

XML Processing



0..1

PriceListeffDatePriceListeffDate

ProdnumProdnum

PricecurrencyPricecurrency

DiscounttypeDiscounttype

Prices

1

1..*

1

1

Text1

Text

Joins 2/2

Three-way join in a where clause

Outer Join

Ordernumdatecust

Ordernumdatecust



*

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1

Productdept

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1


M3-59

XML Processing



Quantifiers

Quantified expression using the some keyword

Quantified expression using the every keyword

Combining the not function with the some keyword

Binding multiple variables in a quantified expression

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1

Productdept

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1


M3-60

XML Processing



Distinctness & Grouping

... by department

Ordernumdatecust

Ordernumdatecust



*

M3-61

XML Processing



Sorting & Aggregating

Ordernumdatecust

Ordernumdatecust



*

M3-62

XML Processing



Structure of a XQuery ProgramProlog, Body, Modules 1/3

PrologRole

is the link between the XQuery expression and the environment where the expression is embedded

Partsnamespace declarationsschema importsdefault element and function namespacefunction declarationsfunction library importsglobal and external variable definitions, etceach declaration separated by a semicolon

BodyContains the XQuery expression within { }

Note!a function does not inherit the context from the main body of the query – rather, the context has to be passed as parameter

M3-63

XML Processing




Example 1

Example 2

Prolog

Body

Prolog

Body

M3-64

XML Processing




Module

XQuery style conventions:http://www.xqdoc.org/xquery-style.html

Useful functions available at: http://www.xqueryfunctions.com

M3-65

XML Processing



Simple for and let clause

Intermingled for and let clauses

Appendixfor and let Clauses

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1

Productdept

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1


M3-66

XML Processing



AppendixDirect Constructors 1/3

Wrap the content of each number and name element

Get the content of each name element / order by

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1

Productdept

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1


M3-67

XML Processing




Aggregation function – no tags from input document included

Add attributes class & dep

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1

Productdept

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1


M3-68

XML Processing



Enclosed expressions that evaluate to elements

Enclosed expressions that evaluate to attributes

Enclosed expressions with multiple subexpressions


Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1

Productdept

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1


M3-69

XML Processing



AppendixConditional Expressions

Simple conditional expression

Conditional expression returning multiple expressions

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1

Productdept

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1


M3-70

XML Processing



AppendixQuantifiers

A where clause with multiple expressionsand an exists quantifier

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1

Productdept

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1


M3-71

XML Processing



AppendixOrdering

The order by clause

Using multiple orderingspecifications

Ordernumdatecust

Ordernumdatecust



*

M3-72

XML Processing



AppendixDistinctness & Aggregation 1/3

Distinctness on a combination of values

Aggregation – sum

Ordernumdatecust

Ordernumdatecust



*

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1

Productdept

Catalog

*

1

ColorChoices

Text

1 0..1

Desc

Text

0..1

1 1

Number Namelanguage

Text Text1 1


M3-73

XML Processing




Aggregation – count, sum

Ordernumdatecust

Ordernumdatecust



*

M3-74

XML Processing




Aggregation on multiple values

Ordernumdatecust

Ordernumdatecust



*

M3-75

XML Processing



Outline

IntroductionXPathXQueryXML & DB

MotivationStorage AlternativesAccess AlternativesSQL/XML – SQL:2003-Standard


M3-76

XML Processing



Motivation XML and DB – Why?

Existing DB store large amounts of dataPublish data as XML documents

Existing DB should store existing XML documentsStorage in DB along with additional „meta“ information

Well-known Benefits of DB Efficient storage of large amounts of well-structured dataStructured query language (SQL)OptimizationViews and security mechanismsConcurrency Control / Transactions – more fine-grained than just on a document levelRecovery techniques

......

......

........................

XML Doc.<a>

...<c d=.../>

</a>

......

......

........................

XML Doc.<a>

...<c d=.../>

</a>

......

......

........................

XML Doc.<a>

...<c d=.../>

</a>

......

......

........................

XML Doc.<a>

...<c d=.../>

</a>

DB are essential cornerstones of today’s IT infrastructures –the importance of DB for Web applications steadily increases"... The Web is one huge database..."

[The Asilomar Report on Database Research, SIGMOD Record 27(4), Dec. 1998]

M3-77

XML Processing



MotivationThe Challenge: Different Categories of XML Documents

Data-orientedWell-known, fine-grained, typed structureOrdering of subelements doesn‘t matterSchema available, defining the structureExamples: order, invoice

Document-orientedSemi-structured, course grained, untypedOrdering of subelements significantMixed content commonSchema often non-existent or very genericExample: Claim

MixtureBeispiel: Email

<Order orderNr="1012"><CustomerNr>8596</CostumerNr><Position posNr="1">

<ProductNr>14896612</ProductNr><Amount>2</Amount>...

</Position>...</Order>

<Claim>A severe <Reason>fire</Reason>damaged the building and claimed <DeathToll>12</DeathToll> lives. First investigations done by police indicate fire raising with <Motive>criminal intent</Motive>.</Claim>

<Email><Sender>[email protected]</Sender>...<Recipient>[email protected]</Recipient><Content>All the best to your 110th birthday!</Content>

</Email>

M3-78

XML Processing



Storage Alternatives Overview

Storage Alternatives

File system DBS Hybrid

Native DBSConventional DBS

Datamodel

RM

OROO

XML

no Schema XML SchemaDTDSchema

Language

File systemXML documents stored as files at operating system levelAdditional descriptive attributes and file referencesstored within DBS possible

DBSXML document stored in DBS as a whole or shreddered, eventually together with descriptive attributes

HybridXML document or parts thereof stored across DBS and file systemRedundant or non-redundant storage possible

Non-shreddered vs.shreddered

M3-79

XML Processing



Storage Alternatives Native Storage

Conceptual XML mapping to a fine-grained storage structureTransformation into an internal XML treeOften DOM-trees are resembledElement names are replaced by means of a dictionary

http://www.idealliance.org/proceedings/xml05/ship/58/Native_XML_Databases.HTML

M3-80

XML Processing



Storage AlternativesRelational Storage – Heterogeneity 1/2

DatamodelLevel M2

InstanceLevelM0

SchemaLevelM1

XML-Document

ElementElement Value

AttributeElement Typ

DTD / XML Schema (optional)

AttributeAttribute Value

Element Type aElement Type b...

Attribute xAttribute y...

XML Concepts

Relational DB

Relation Attribute

Tupel Value

Relationales Schema

Relation ARelation B...

Attribute XAttribute Y...

Relational Concepts

Legend: ... consistsOf... mayConstistOf

M3-81

XML Processing



Storage AlternativesRelational Storage – Heterogeneity 2/2

XML (DTD)nestedbasically „STRING“ onlystored within attributes and ETselements are ordered

just a single attribute of type ID

IDREFs (untyped) andnested ETs (typed)

RDBSflatnumerousstored within attributestupels are not ordered

composite key possibleforeign key – typed

optionalalso after instance creationschema in form of tags is part of the instance data – “self-describing”

necessary created prior to instancesnot part of the instances

StructureDatatypesValuesOrderIdentificationRelationships

Schema

M3-82

XML Processing



*

hotelChain

«attribute»hotelID

name category location telephone

hotel

room

roomCat price

*

*11 1

1 1

1

UML Diagram:

<!ELEMENT hotelChain (hotel*)><!ELEMENT hotel (name, category, location, telephone*, room*)><!ATTLIST hotel hotelID CDATA #REQUIRED><!ELEMENT name (#PCDATA)><!ELEMENT category (#PCDATA)><!ELEMENT location (#PCDATA)><!ELEMENT telephone (#PCDATA)><!ELEMENT room (roomCat, price)><!ELEMENT roomCat (#PCDATA)><!ELEMENT price (#PCDATA)>

DTD:

Storage AlternativesRelational Storage – Example

M3-83

XML Processing



Fixed SchemaSchema is domain-independent (e.g., Handy-Catalog)and independent from the target schema

no decomposition: XML-document is stored as a wholedecomposition: XML-document is “shreddered”Similarities with the generic XML API approach

Derived SchemaSchema is derived from the other oneSimilarities with the XML Data Binding approach

User-Defined SchemaSchema is domain-dependent, but has been designed independent of the target schema

Storage Alternatives Relational Storage – Mapping Onto a Schema

Schema DB-side

fixedderived

user-defined

fixed us

er

defin

edde

rived

SchemaXML-side

M3-84

XML Processing



Storage Alternatives Mapping Onto a Fixed RDB-Schema

:hotelChain

«attribute»:hotelID

:name :category :location :telephone

:hotel

:room

:roomCat :price

Element Name DB ValueAttribute Name DB ValueXML Value DB Value

[cf. Florescu et al., IEEE Data Engineering, 1999]

Example: Decomposition of the document (content and schema)into a single table

FixedMappingTableSource Ordinal Name Target/Value...

location Viennatelephone 0043/732/2468roomroomCat Suite

...

M3-85

XML Processing



*

hotelChain



hotel

room

roomCat price

*

*11 1

1 1

1

Problem: Fragmentation

Element Type DB RelationAttribute DB AttributeForeign Keys connect Elements

rID hID

room

rcID rID value

roomCathcID

hotelChain

[cf. Shanmugasundaram et al., VLDB, 1999]

hID hcID hotelID

hotel

nID hID value

name

lID hID value

location

cID hID value

category

pID rID value

price

Example: Decomposition of the XML Schema into tables („Basic Inlining“)

Storage Alternatives Mapping Onto a Derived RDB-Schema

tID hID valuetelephone

M3-86

XML Processing



ID Phone# Desc

Phone

ID RoomCat Rate

RoomRatesTownID TownName Country

Town

ID Name Category TownID

Accommodation

*

hotelChain



hotel

room

roomCat price

*

*11 1

1 1

1

Example: Mapping of the XML Schema intoexisting tables and attributes

Storage Alternatives Mapping onto a User-defined RDB-Schema

M3-87

XML Processing



Storage AlternativesMapping Options – Advantages/Disadvantages

Fixed- Domain not represented in schema- Queries/optimization hard to realize+ Fixed at DB-side:

no Schema at XML-side necessarybest suited for document-oriented XML

Derived- The schema at the other hand side must exist

User-Defined+ Schema can be designed independent of the target schema+ Data of existing DBs can be used!- Heterogeneity problem!

Derived / User-Defined+ Domain is represented in schema+ Optimization mechanisms usable+ Suited especially for data-oriented XML

fix derived

fixedmapping

n.a.

XML DB user-defined

fix

derived

user-defined

user-definedmapping

derivedmapping

fixedmapping

derivedmapping

n.a.

n.a.

n.a.

M3-88

XML Processing



Storage Alternatives Representation of Mapping Knowledge

“Template-Driven”mapping knowledge hard-codedqueriestransformation programs

“Model-Driven”mapping knowledge reified (i.e., stored as meta data)as a file, e.g., as XML documentin the DB, usage of DB functionality

<?xml version="1.0" ?><Accommodation xmlns:sql="urn:schemas-ms-com:xml-sql">

<sql:query>SELECT * FROM Accommodation FOR XML AUTO,ELEMENTS

</sql:query></Accommodation>

<?xml version="1.0" ?><Schema xmlns="urn:schemas-ms-com:xml-data"

xmlns:dt="urn:schemas-microsoft-com:datatypes"xmlns:sql="urn:schemas-microsoft-com:xml-sql">

<ElementType name="Phone" content="textOnly" /><ElementType name="Accommodation" sql:relation="Accommodation">

<element type="Phone" sql:relation="Phone" sql:field="Number"> <sql:relationship

key-relation="Accommodation"key="AcID"foreign-key="AccID"foreign-relation="Phone" />

</element></ElementType>

</Schema>

M3-89

XML Processing



Read-only QueryXML-centered

Access via XML-based languageW3C XQuery-Standard

DB-centeredAccess via SQL-based languageSQL/XML – Part of the current SQL2003-Standard

Proprietary MechanismNeither DB- nor XML-centered

Data ManipulationCurrent research areaXQuery Update Facility, W3C Candidate Rec. Aug. 2008http://www.w3.org/TR/xqupdate/

Access Alternatives Read-only Query vs. Data Manipulation

M3-90

XML Processing



SQL/XMLFirst Edition: Part of SQL:2003-Standard

Storage of XML documentsIntroduction of new datatype XMLTypeAutomatic shredding which can be customized

Publishing stored data by extending SQLwith XML-Functions

Functions for retrieving relational data andtransform it into XML (e.g., XMLGen, XMLElement, XMLAgg)

Unfortunately, SQL:2003 pre-dated the XQuerystandard

Therefore no full XQuery functionality avaliablecf. SQL:2007 ...

RDBS

XMLType

SQL SQL/XML Functions

XML Documents

M3-91

XML Processing



SQL/XMLSecond Edition: Part of Forthcoming SQL:2007-Standard

More complete integration of XQuery Data ModelXML datatype will support XQuery data model

heterogeneous sequencesnon well-formed XML data full XML Schema support and validation

Advanced Query capabilitiesXMLQuery() function

create XML content using XQuery

XMLTable() function shred XML to relational data using XQuery

Mapping between SQL & XQuery data modelXMLCAST between XML and SQL types

Figure „IBM DB2“from an article ofHolger Seubert

M3-92

XML Processing



SQL/XMLWhat is it Good For …

BenefitsTakes advantage of the entire SQL infrastructure (e.g. triggers, PL/SQL)Transactional supportScalability, clustering, reliabilityGlobal optimization (XML and relational)Standard implemented and supported by Microsoft, Oracle, IBM, etc.

DrawbacksRequires data to be loaded into the DB

not good for temporary XML datanot worth the effort for small volumes of data

Blending of the two languages (SQL, XQuery) isn’t naturalXQuery not supported entirely by DB engines

No XML updates a la XQuery yet

M3-93© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)

XML Processing

Literature

Standard-Specificationshttp://www.w3.org/TR/xpath20/http://www.w3.org/TR/xquery/http://www.sqlx.org

SQL/XML StandardBest source (!) on XML & DBS incl. an extensive overview of available systems:

http://www.rpbourret.com/xml/XMLDatabaseProds.htmhttp://www.rpbourret.com/xml/XMLAndDatabases.htm

Interesting collection of papers:http://www.cs.cornell.edu/People/jai/pubs.html#PaperCategory:PublishingRelationalDataAsXML

GI-Working Group „Web und Datenbanken“: http://dbs.uni-leipzig.de/webdb/

M. Koran, Evaluierung von XML Datenbanken, Master Thesis, Universität Zürich, Oktober 2006 [http://www.ifi.uzh.ch/index.php?id=490&print=1&no_cache=1]Books

H. Katz, et al., XQuery from the Experts, Addison Wesley, 2004.J. Melton et al., Querying XML: XQuery, XPath, and SQL/XML in Context, Morgan Kaufmann/Elsevier, 2006M. Klettke, H. Meyer, XML & Datenbanken: Konzepte, Sprachen und Systeme, Meike Klettke, Holger Meyer, dpunkt, 2003http://www.xml-und-datenbanken.de/

Web & Datenbanken: Konzepte, Architekturen, Anwendungen, Erhard Rahm, Gottfried Vossen (Hrsg.), dpunkt, 2003

Bastian Gorke: XML-Datenbanken in der Praxis, bomots Verlag, 2006