M3 XML Processing - Sepp Hochreiter · zXML Data Binding – Non-Generic Mapping {JAXB 2.0 – Java Architecture for XML Binding {SDO – Service Data Objects (J2EE platform) {ADO
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
IntroductionMotivationXML Processing Alternatives – OverviewExtensions of Existing Languages Interfaces to Existing LanguagesNative XML Processing
XPathXQueryXML & DB
The following slides are based (among others) on:Kay, Michael: XPath 2.0 Programmer's Reference (3rd ed.), Wiley, Aug. 2004.Walmsley, Priscilla, XQuery, OReilly, March 2007.Klettke, Meike, Meyer, Holger: XML & Datenbanken, dpunkt.verlag, Jan. 2003.
We need to “process” it, including its “storage”Filter, search, select, join, aggregateCreate new pieces of informationClean, normalize the data Update itVerify the correctness Take actions based on the existing dataWrite complex execution flowsStore it efficiently
No common architecture like for RDBS Applications are too heterogeneous
DeclarativeSQL/XML – part of the SQL:2003-Standard
(2) Interfaces to Existing LanguagesXML APIs – Generic Mapping
DOM, SAX, StaX
XML Data Binding – Non-Generic MappingJAXB 2.0 – Java Architecture for XML BindingSDO – Service Data Objects (J2EE platform)ADO – ActiveX data objects (.NET platform)EMF – Eclipse Modeling Framework
(2) Interfaces to Existing LanguagesXML Data Binding
Non-Generic Mappings
Mapping of the XML Schemaof the XML data to appropriatecode in the target languageBased on this mapping, marshalling / unmarshallingbetween XML and objectsAdvantages
Abstraction from low-level APIs& the details of the parsing processDevelopment effort and error-proness can be reduced
DisadvantagesHigh memory demands forlarge XML documentsXML Schemaevolution leads to a new generation of thecorrsponding classes
The only alternative such that …the data is modeled only onceit is well integrated with the XML Schema type systemit preserves the logical/physical data independencethe code deals with non-generic structuresthe code can be optimized automatically
Data is storedin plain file systems or in dedicated data storese.g. XML extensions of RDBS
Missing pieces, under developmentprocedural logicupdate language…
The following slides are based (among others) on:Kay, Michael: XPath 2.0 Programmer's Reference (3rd ed.), Wiley, Aug. 2004.Walmsley, Priscilla, XQuery, OReilly, March 2007.Klettke, Meike, Meyer, Holger: XML & Datenbanken, dpunkt.verlag, Jan. 2003.
PurposeOriginal goal: selecting document parts for layout purposes (XSL)Now used for various XML-standards – XML Schema, XPointerNo XML syntax used – proprietary syntaxVarious selection criteria, e.g., element/attribute names, content, type
Basic Processing PrincipleTree-based navigation, similar to navigation in a file system Starting point is always a certain context – i.e., a tree node specified by a XPath expressionNavigation and Filter modify the contextResult of a XPath expression = context computed in the last step
Read-only languageIt cannot create nodes or modify existing nodes, except by callingfunctions written in another languageHowever, it can create new atomic values and sequences of existing nodes
//Type[Price]all Type elements containing a Price childelement
//Producer[ProducerNo]/Type[Price]all Type elements containing a Price childelement, whereby the Type elements must be childelements of a Producer element which contains a ProducerNo childelement
//Producer[Type/Price]all Producer elements containing a Type childelement which in turn contains a Price childelement
//Type[Weight and Price]all Type elements having Weight and Price childelements
//Type[Weight = "141g"]all Type elements containing a Weight childelement with value 141g
//Type[@name = "7110"]all Type elements containing an attribute name with value 7110
Union | //Type/Weight | //Type/Priceall Weight and Price childelements of Type elements
Index-based access via the node’s context position//Type[1]first Type elementType[last()]last Type element
Variable $qnamefrom within XPath 1.0, variables can be referenced onlythe variable $qname has to be definedby the application using XPath 1.0 (e.g., XSLT or XQuery)Note: XPath 2.0 can also bind values to variable („for-clause“)
Relative PathProcessing starts at the current context node (determined e.g., by the preceding Location Step)
Absolute Path Processing starts at the root node ("/") INDEPENDENT of the current context
Location Step
AxisName – Navigation via axes name (ancestor, etc.)Short forms for some axes nameschild:: element-name element-nameattribute::attname @attname/descendant-or-self::node()/ //self::node() .
::NodeTest – Node filtering (1)Name of the node, or Wildcard "*" – arbitrary elements, "@*" – arbitrary attributes, or Type of the node on basis of a function (text(), comment(), processing-instruction(), node())
Result = Set of Nodes
[predicate] – Node filtering (2)Is a Filter on all nodes selected by NodeTest – e.g., specification of the context position via the nodes’ numberMultiple predicates are processed from left2rightResult = Boolean ValuePredicates may again contain Location Paths
E.g., selection of a node, in case that certain elements/attributes exist in the context of this node//address[tel/@type="work"]
Everything is a „sequence“ and Sequence ProcessingConstruction operatorsFilterNew set operators in addition to UNIONFunctions for list manipulationAggregation functions
Support of XML Schema‘s Type SystemType annotationsTyped valuesType expressions
Changes to Path ExpressionsNode tests now also on basis of XML Schema TypesLocation steps can be now defined by function calls
New ExpressionsControl primitives: «for» and «if»Quantifiers: «some» and «every»
Consequence of „everything is a sequence“Every operand of an expression is a sequenceEvery result of an expression is a sequence
2 characteristics: closure and composabilityThe language is closed every possible operation applied to a sequence generates again a sequenceTherefore expressions can be nested arbitrarily –composability
Union (alternative: | as in XPath 1.0)(A, B) union (A, B) (A, B) (A, B) union (B, C) (A, B, C)
Intersection(A, B) intersect (A, B) (A, B)(A, B) intersect (B, C) (B)
XPath 1.0 versus XPath 2.0Determine whether the node $x is included in the /foo/bar node-setXPath 1.0: count(/foo/bar)=count(/foo/bar | $x)XPath 2.0: $x intersect /foo/bar
Difference(A, B) except (A, B) ()(A, B) except (B, C) (A)
XPath 1.0 versus XPath 2.0Select all attributes except the one with a given NS-qualified nameXPath 1.0: @*[not(namespace-uri()='http://example.com' and local-name()='foo')]XPath 2.0: @* except @exc:foo
XPath 1.0 supports Node-setsBooleansStringsA single numeric data type (double precision floating point)
Weakly typed language
XPath 2.0 supportsSequences as a data typeAll 19 primitive simple types built into XML Schema like integers, decimals, single precision, dates, times, durations, …User-defined data typesStrong type checking as well as weak type checking
hybrid languagesatisfies data-oriented and document-oriented world
XPath 2.0Path Expressions – Node Test by Schema Type
Node tests in XPath 1.0On basis of the node‘s name and it‘s predefined 7 types
Node tests in XPath 2.0Also on basis of the node‘s type defined by XML SchemaFor example, select all elements of type Person, regardless of the nameUseful especially when using a schema with a rich typehierarchy in which many elements can be derived from thesame type definition
XPath 2.0Path Expressions – Function as Location Step
Now, a function call can be used as a location stepAllows to follow logical relationships in the document’s structure, not just physical relationships given by the hierarchyExample: «customer[@id="123"]/find-orders(.)/order-value»The person writing a path expression doesn’t necessarily need to know how the orders for a customer are found
supports some kind of information hiding encapsulationthe way that they are found can change without invalidating the expression locality of change
XPath itself does not allow to write the find-orders()function
XPath 2.0Existential «some» and Universal «every» Quantifiers
XPath 1.0 equals operator (=) could compare node-sets/students/student/name = "Fred" returns true if anystudent name is equal to "Fred" existential quantificationThe same applies to !=, <, >,…;
e.g. /students/student/name != "Fred" returns true if anystudent name is not equal to "Fred"
XPath 2.0 makes it possible to write explicit quantifiedexpressions – existentially and universially quantified
some $x in /students/student/name satisfies $x = "Fred"every $x in /students/student/name satisfies $x = "Fred"
This formulation is more powerful, because the constrainingcondition can be anything (not just =, !=, < and so on)
some $item in //LineItemsatisfies (($item/Price * $item/Quantity) > 100)some $x in (1, 2, 3), $y in (2, 3, 4) satisfies $x + $y = 4
IntroductionFor and let clausesAdding Elements/Attributes to ResultsConditional ExpressionsJoinsQuantifiersDistinctness & GroupingSorting & AggregatingStructure of a XQuery ProgramAppendix
XML & DB The following slides are based (among others) on:
Kay, Michael: XPath 2.0 Programmer's Reference (3rd ed.), Wiley, Aug. 2004.Walmsley, Priscilla, XQuery, OReilly, March 2007.Klettke, Meike, Meyer, Holger: XML & Datenbanken, dpunkt.verlag, Jan. 2003.
XPath 2.0Common language fornavigation, selection, extractionUsed in XSLT, XQuery, XPointer, XML Schema, XForms, etc.
XSLT 2.0: XML ⇒ XML, HTML, TextLoosely-typed scripting languageFormat XML in HTML for display in browserMust be highly tolerant of variability/errors in data
XQuery 1.0: XML ⇒ XMLStrongly-typed query language – enforces input and output typesMust guarantee safety/correctness of operations on data – side-effect freeLarge-scale database access
W3C-REC Jan. 2007XQuery 1.0 and XPath 2.0 Functions and Operators
the functions you can call in XPath expressions and the operations you can perform on XPath 2.0 data types
XQuery 1.0 and XPath 2.0 Data Model (XDM)representation and access for both XML and non-XML sources
XSLT 2.0 and XQuery 1.0 Serializationhow to output the results of XSLT 2.0 and XML Query evaluation in XML, HTML or as text
XML Syntax for XQuery 1.0 (XQueryX)an XML-aware syntax for querying collections of structured and semi-structured data both locally and over the Web
XQuery 1.0 and XPath 2.0 Formal Semanticsthe type system used in XQuery and XSLT 2.0 via XPath defined precisely for implementers
W3C Working Drafts / Java Community ProcessXQuery Update – Candidate Recommendation since August 2008!XQuery and XPath Full Text SearchXQJ – Query API for Java (~ JDBC)
Adding Elements/Attributes to ResultsThree Use Cases
(1) 1:1 copying of elements/attributes from the input documentSimple elementsComplex elements – along with their attributes and children if any (notjust their atomic values!)No opportunity to change attributes, children, etc.
(2) Direct element/attribute constructors – a mixture of ...Literal content („hard-coded“) – appears as is in the output documentExpressions within „{}“ evaluating to any kind of node (elements, attributes, etc.) and to atomic valuesUsing XML syntax (proper nesting, case sensitivity, etc.)
(3) Computed constructorsAllows for dynamic names of nodes and dynamic valuesCopying tags from the input document but making minor changes(e.g., add an attribute)Turning content from the input document into markup
Structure of a XQuery ProgramProlog, Body, Modules 1/3
PrologRole
is the link between the XQuery expression and the environment where the expression is embedded
Partsnamespace declarationsschema importsdefault element and function namespacefunction declarationsfunction library importsglobal and external variable definitions, etceach declaration separated by a semicolon
BodyContains the XQuery expression within { }
Note!a function does not inherit the context from the main body of the query – rather, the context has to be passed as parameter
The following slides are based (among others) on:Kay, Michael: XPath 2.0 Programmer's Reference (3rd ed.), Wiley, Aug. 2004.Walmsley, Priscilla, XQuery, OReilly, March 2007.Klettke, Meike, Meyer, Holger: XML & Datenbanken, dpunkt.verlag, Jan. 2003.
Existing DB store large amounts of dataPublish data as XML documents
Existing DB should store existing XML documentsStorage in DB along with additional „meta“ information
Well-known Benefits of DB Efficient storage of large amounts of well-structured dataStructured query language (SQL)OptimizationViews and security mechanismsConcurrency Control / Transactions – more fine-grained than just on a document levelRecovery techniques
......
......
........................
XML Doc.<a>
<b>...</b><c d=.../>
</a>
......
......
........................
XML Doc.<a>
<b>...</b><c d=.../>
</a>
......
......
........................
XML Doc.<a>
<b>...</b><c d=.../>
</a>
......
......
........................
XML Doc.<a>
<b>...</b><c d=.../>
</a>
DB are essential cornerstones of today’s IT infrastructures –the importance of DB for Web applications steadily increases"... The Web is one huge database..."
[The Asilomar Report on Database Research, SIGMOD Record 27(4), Dec. 1998]
MotivationThe Challenge: Different Categories of XML Documents
Data-orientedWell-known, fine-grained, typed structureOrdering of subelements doesn‘t matterSchema available, defining the structureExamples: order, invoice
Document-orientedSemi-structured, course grained, untypedOrdering of subelements significantMixed content commonSchema often non-existent or very genericExample: Claim
<Claim>A severe <Reason>fire</Reason>damaged the building and claimed <DeathToll>12</DeathToll> lives. First investigations done by police indicate fire raising with <Motive>criminal intent</Motive>.</Claim>
<Email><Sender>[email protected]</Sender>...<Recipient>[email protected]</Recipient><Content>All the best to your 110th birthday!</Content>
Conceptual XML mapping to a fine-grained storage structureTransformation into an internal XML treeOften DOM-trees are resembledElement names are replaced by means of a dictionary
BenefitsTakes advantage of the entire SQL infrastructure (e.g. triggers, PL/SQL)Transactional supportScalability, clustering, reliabilityGlobal optimization (XML and relational)Standard implemented and supported by Microsoft, Oracle, IBM, etc.
DrawbacksRequires data to be loaded into the DB
not good for temporary XML datanot worth the effort for small volumes of data
Blending of the two languages (SQL, XQuery) isn’t naturalXQuery not supported entirely by DB engines
Interesting collection of papers:http://www.cs.cornell.edu/People/jai/pubs.html#PaperCategory:PublishingRelationalDataAsXML
GI-Working Group „Web und Datenbanken“: http://dbs.uni-leipzig.de/webdb/
M. Koran, Evaluierung von XML Datenbanken, Master Thesis, Universität Zürich, Oktober 2006 [http://www.ifi.uzh.ch/index.php?id=490&print=1&no_cache=1]Books
H. Katz, et al., XQuery from the Experts, Addison Wesley, 2004.J. Melton et al., Querying XML: XQuery, XPath, and SQL/XML in Context, Morgan Kaufmann/Elsevier, 2006M. Klettke, H. Meyer, XML & Datenbanken: Konzepte, Sprachen und Systeme, Meike Klettke, Holger Meyer, dpunkt, 2003http://www.xml-und-datenbanken.de/
Web & Datenbanken: Konzepte, Architekturen, Anwendungen, Erhard Rahm, Gottfried Vossen (Hrsg.), dpunkt, 2003
Bastian Gorke: XML-Datenbanken in der Praxis, bomots Verlag, 2006