Top Banner
Tamino – a DBMS Designed for XML Dr. Harald Schoning Presenter: Wenhui Li University of Ottawa Instructed by: Dr. Mengchi Liu Carleton University
35

Tamino – a DBMS Designed for XML

Jan 13, 2016

Download

Documents

Marjan Goodarzi

Tamino – a DBMS Designed for XML. Dr. Harald Schoning Presenter: Wenhui Li University of Ottawa Instructed by: Dr. Mengchi Liu Carleton University. Abstract. Who?- Software AG What?- XML database management system When? 1999 the first time unveiled 2004 June Tamino XML Server 4.2 - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Tamino  – a DBMS Designed for XML

Tamino – a DBMS Designed for XML

Dr. Harald Schoning

Presenter: Wenhui LiUniversity of Ottawa

Instructed by:Dr. Mengchi Liu

Carleton University

Page 2: Tamino  – a DBMS Designed for XML

Abstract Who?- Software AG What?- XML database management

system When?

1999 the first time unveiled 2004 June Tamino XML Server 4.2

Why? management and transfer of structured and

unstructured data completely designed for XML

Page 3: Tamino  – a DBMS Designed for XML

Industry Background XML is becoming prevailing for

data processing in the internet. Early goals of Tamino

Easy data exchanging Evolution trend

Storing, managing, publishing and exchanging XML documents

Business modeling

Page 4: Tamino  – a DBMS Designed for XML

Industry Background cont’XML support in databases

Oracle XML Developer’s Kit SQL Server 2000 DB2 XML Extender

Page 5: Tamino  – a DBMS Designed for XML

Limitations of XML support via traditional RDBMS or ORDB

XML is not well-structured like RDB,ORDB or OODB

Storing and querying XML is possible but not feasible in these DB systems

Page 6: Tamino  – a DBMS Designed for XML

Two Modeling approaches Data-centric documents

Regular structure Order does not matter No mixed content

Document-centric documents less regular structure significance of the order mixed content

Page 7: Tamino  – a DBMS Designed for XML

Why don’t use relational DB

XML documents can have schematic information (DTD), but they are not required to.

classical database handling objects of a predefined type, cannot be applied in XML

Page 8: Tamino  – a DBMS Designed for XML

Why doesn’t use XML itself?

XML is just a markup language, it does not contain processing facilities on its own

querying a set of XML documents is outside the scope of the XML recommendation

Therefore, comes the Tamino!

Page 9: Tamino  – a DBMS Designed for XML

What does Tamino do?

What’s Tamino (the 1st slide) Store XML documents, HTML files

and GIF images, etc. Retrieve them in a set-oriented

manner, with sophisticated query facilities

Page 10: Tamino  – a DBMS Designed for XML

Tamino’s architecture

Page 11: Tamino  – a DBMS Designed for XML

The schema of XML documents XML support schematic information,

but it differs from the classical databases

DTD have a couple of deficiencies (e.g. data type)

W3C working group is developing an XML schema description language

However, DTD is the only standard schema at present

Page 12: Tamino  – a DBMS Designed for XML

XML schema vs. RDB and OODB schema

In RDB or OODB, the schema is created before the instances can be stored

Instances must conform to the declared schema

In XML database, each instance declares a schema on its own.

for XML documents, grouping of objects of homogeneous structure into (pre-defined) tables or classes doesn’t work

Page 13: Tamino  – a DBMS Designed for XML

Query and Index of XML schema Queries operate on sets Indexes are defined on the basis of a

common schema For the purpose of querying, arbitrary

objects could be grouped to sets Index definition also requires at least a

common subset in the structure

Page 14: Tamino  – a DBMS Designed for XML

Schema handling in Tamino Grouping documents by open content

model + user-directed document grouping

Documents grouped into collections Within a collection, declare several

document types For each document type define a

common schema (open content model) For each document, Tamino assigns one

of the document type

Page 15: Tamino  – a DBMS Designed for XML

Type Assignment Assignment is based on the root element

type Document must match the schema of the

document type assigned, but might have additional elements/attributes

In a document type, documents might differ considerably

If no appropriate document type, document is stored without any schema checking

Page 16: Tamino  – a DBMS Designed for XML

Tamino schema example

Page 17: Tamino  – a DBMS Designed for XML

Document accepted by Tamino<City Inhabitants=”138000”>

<Name>Darmstart</Name><Addition>The city of art

nouveaud</Addtion><Monument Height=”39m”>

<Name>Langer Ludwig</Name><Location>

<Name>Luisenplatz<Name><MapIndex>M5</MapIndex>

</Location></Monument>

</City>

Page 18: Tamino  – a DBMS Designed for XML

Is an element/attribute should be modeled? an index will be defined on this

element/attribute the element/attribute is to be mapped to an

external data source or to a server extension

dedicated access rights will be defined on the element/attribute

the presence / multiplicity of the element is to be enforced

one of the above conditions hold for a child of the element

Page 19: Tamino  – a DBMS Designed for XML

Indexing of Tamino

value-based indexes well known from traditional database

systems used to accelerate the search exactly address the data object names need not be unique within a

DTD

Page 20: Tamino  – a DBMS Designed for XML

Example of value-based index value-based indexes

data-centric view<!ELEMENT City (Name, Inhabitants,

Monument+)><!ELEMENT Monument (Name, Description)><!ELEMENT Inhabitants (#PCDATA)><!ELEMENT Name (#PCDATA)><!ELEMENT Description (#PCDATA)>

Page 21: Tamino  – a DBMS Designed for XML

Indexing of Tamino (cont’)

text indexing document-centric view limit the scope to a specific part of

the document the scope might span element

content

Page 22: Tamino  – a DBMS Designed for XML

Example of text index text indexing

document-centric view<statement>

<author><firstname>Harald></

firstname><lastname>Schoning</lastname>

</author><text>

X<italic>M</italic>L and X<italic>S</italic>L

are <stressed>very</stressed> important</text>

</speech>

Page 23: Tamino  – a DBMS Designed for XML

Indexing of Tamino (cont’)

structural index If multiplicity permits the omission of

elements or if no DTD is known

Example in a database of all European cities search all those cities which have an

element called “beach”

Page 24: Tamino  – a DBMS Designed for XML

Querying XML documents Currently, there is no standardized query

language XPath allows positioning within a single

document XPath fits well the needs of retrieval in

data-centric environments document-centric environments need a

more content-based retrieval facility Tamino also supports full text search

Page 25: Tamino  – a DBMS Designed for XML

Expectation for XML processor W3C:XML recommendation specifies

the handling of entities, comments and processing instructions.

User: Tamino, leave comments intact, no processing instruction evaluated, leave entity references unresolved.

User: the output of a Tamino query should match the specification of an XML processor.

Page 26: Tamino  – a DBMS Designed for XML

Why don’t leave entities unresolved?

In case result is a set of (parts of) matching documents

This result DTD must include all different entity declarations of the original document

Definition of the entity might differ from document to document

So, for the same entity name, entities are renamed, and the entity references are changed accordingly.

Page 27: Tamino  – a DBMS Designed for XML

problems of external entities These entities can change without the

database system knowing about this Thus, the values of external entities must not

be included in indexes Example:

<!ENTITY &mysubject SYSTEM“http://www.softwareag.com/

hottopic.xml”>...<ticker>Todays hot topic: &mysubject</ticker>

Checking the current contents of the external entity lead to unacceptable response times.

Page 28: Tamino  – a DBMS Designed for XML

Relational Databases and XML major (object-) relation database

systems include some forms of XML support

The simplest form is to generate XML documents for existing relational data.

But, real database handling of XML requires that XML data can be stored and retrieved

Two approaches

Page 29: Tamino  – a DBMS Designed for XML

XML support approach(1)

Map the XML document is to relational tables and their columns

Markup is ignored on storage, and reconstructed on retrieval

advantage of this approach: the contents of an XML document can

be handled with traditional SQL

Page 30: Tamino  – a DBMS Designed for XML

XML support approach(1) cont’

Shortcomings: The sequence information lost

<Order CustomerId=”567” Date=”12- 12-2000”><Item ProductID=” 17” Quantity=”2”/><Item ProductID=”l6” Quantity=”9”/><Item ProductID=“ 19 ” Quantity=“8”/>

</Order>

The retrieval of the order:

<Order CustomerId=”567” Date=”12-12-2000”><Item ProductID=” 16” Quantity=”9’/><Item ProductID=” 17” Quantity=”2”/><Item ProductID=” 19” Quantity=”8”/>

</Order>

Page 31: Tamino  – a DBMS Designed for XML

XML support approach(1) cont’

Data-centric documents sequence might not matter, it does for document-centric

this approach loses all comments and processing instructions

mixed content cannot be stored easily in this model

Page 32: Tamino  – a DBMS Designed for XML

XML support approach(2)

Leaves the XML document intact and stores it in a large text field (“BLOB”)

Or even outside the database Text search is possible Can limit a certain text-based

condition

Page 33: Tamino  – a DBMS Designed for XML

XML support approach(2) cont’ Limitations:

no structure-aware combinations are possible

Value-based search is not supported on these text fields

IBM solution: side tables But, direct manipulation of side tables

destroys the consistency of the database Security can be defined on document

level only, but not on elements or attributes

Page 34: Tamino  – a DBMS Designed for XML

Summary Tamino was designed with particular attention to

the XML Schema handling for XML is different from

relational databases does In Schema handling, external entities cause

conceptual problems value-based indexes are useful for XML, as well

as text index and structural index Comments and processing instructions should

be preserved when documents are stored The result of a query against an XML database

should be XML

Page 35: Tamino  – a DBMS Designed for XML

Q&A

Thanks!