Tamino – a DBMS Designed for XML Dr. Harald Schoning Presenter: Wenhui Li University of Ottawa Instructed by: Dr. Mengchi Liu Carleton University
Jan 13, 2016
Tamino – a DBMS Designed for XML
Dr. Harald Schoning
Presenter: Wenhui LiUniversity of Ottawa
Instructed by:Dr. Mengchi Liu
Carleton University
Abstract Who?- Software AG What?- XML database management
system When?
1999 the first time unveiled 2004 June Tamino XML Server 4.2
Why? management and transfer of structured and
unstructured data completely designed for XML
Industry Background XML is becoming prevailing for
data processing in the internet. Early goals of Tamino
Easy data exchanging Evolution trend
Storing, managing, publishing and exchanging XML documents
Business modeling
Industry Background cont’XML support in databases
Oracle XML Developer’s Kit SQL Server 2000 DB2 XML Extender
Limitations of XML support via traditional RDBMS or ORDB
XML is not well-structured like RDB,ORDB or OODB
Storing and querying XML is possible but not feasible in these DB systems
Two Modeling approaches Data-centric documents
Regular structure Order does not matter No mixed content
Document-centric documents less regular structure significance of the order mixed content
Why don’t use relational DB
XML documents can have schematic information (DTD), but they are not required to.
classical database handling objects of a predefined type, cannot be applied in XML
Why doesn’t use XML itself?
XML is just a markup language, it does not contain processing facilities on its own
querying a set of XML documents is outside the scope of the XML recommendation
Therefore, comes the Tamino!
What does Tamino do?
What’s Tamino (the 1st slide) Store XML documents, HTML files
and GIF images, etc. Retrieve them in a set-oriented
manner, with sophisticated query facilities
Tamino’s architecture
The schema of XML documents XML support schematic information,
but it differs from the classical databases
DTD have a couple of deficiencies (e.g. data type)
W3C working group is developing an XML schema description language
However, DTD is the only standard schema at present
XML schema vs. RDB and OODB schema
In RDB or OODB, the schema is created before the instances can be stored
Instances must conform to the declared schema
In XML database, each instance declares a schema on its own.
for XML documents, grouping of objects of homogeneous structure into (pre-defined) tables or classes doesn’t work
Query and Index of XML schema Queries operate on sets Indexes are defined on the basis of a
common schema For the purpose of querying, arbitrary
objects could be grouped to sets Index definition also requires at least a
common subset in the structure
Schema handling in Tamino Grouping documents by open content
model + user-directed document grouping
Documents grouped into collections Within a collection, declare several
document types For each document type define a
common schema (open content model) For each document, Tamino assigns one
of the document type
Type Assignment Assignment is based on the root element
type Document must match the schema of the
document type assigned, but might have additional elements/attributes
In a document type, documents might differ considerably
If no appropriate document type, document is stored without any schema checking
Tamino schema example
Document accepted by Tamino<City Inhabitants=”138000”>
<Name>Darmstart</Name><Addition>The city of art
nouveaud</Addtion><Monument Height=”39m”>
<Name>Langer Ludwig</Name><Location>
<Name>Luisenplatz<Name><MapIndex>M5</MapIndex>
</Location></Monument>
</City>
Is an element/attribute should be modeled? an index will be defined on this
element/attribute the element/attribute is to be mapped to an
external data source or to a server extension
dedicated access rights will be defined on the element/attribute
the presence / multiplicity of the element is to be enforced
one of the above conditions hold for a child of the element
Indexing of Tamino
value-based indexes well known from traditional database
systems used to accelerate the search exactly address the data object names need not be unique within a
DTD
Example of value-based index value-based indexes
data-centric view<!ELEMENT City (Name, Inhabitants,
Monument+)><!ELEMENT Monument (Name, Description)><!ELEMENT Inhabitants (#PCDATA)><!ELEMENT Name (#PCDATA)><!ELEMENT Description (#PCDATA)>
Indexing of Tamino (cont’)
text indexing document-centric view limit the scope to a specific part of
the document the scope might span element
content
Example of text index text indexing
document-centric view<statement>
<author><firstname>Harald></
firstname><lastname>Schoning</lastname>
</author><text>
X<italic>M</italic>L and X<italic>S</italic>L
are <stressed>very</stressed> important</text>
</speech>
Indexing of Tamino (cont’)
structural index If multiplicity permits the omission of
elements or if no DTD is known
Example in a database of all European cities search all those cities which have an
element called “beach”
Querying XML documents Currently, there is no standardized query
language XPath allows positioning within a single
document XPath fits well the needs of retrieval in
data-centric environments document-centric environments need a
more content-based retrieval facility Tamino also supports full text search
Expectation for XML processor W3C:XML recommendation specifies
the handling of entities, comments and processing instructions.
User: Tamino, leave comments intact, no processing instruction evaluated, leave entity references unresolved.
User: the output of a Tamino query should match the specification of an XML processor.
Why don’t leave entities unresolved?
In case result is a set of (parts of) matching documents
This result DTD must include all different entity declarations of the original document
Definition of the entity might differ from document to document
So, for the same entity name, entities are renamed, and the entity references are changed accordingly.
problems of external entities These entities can change without the
database system knowing about this Thus, the values of external entities must not
be included in indexes Example:
<!ENTITY &mysubject SYSTEM“http://www.softwareag.com/
hottopic.xml”>...<ticker>Todays hot topic: &mysubject</ticker>
Checking the current contents of the external entity lead to unacceptable response times.
Relational Databases and XML major (object-) relation database
systems include some forms of XML support
The simplest form is to generate XML documents for existing relational data.
But, real database handling of XML requires that XML data can be stored and retrieved
Two approaches
XML support approach(1)
Map the XML document is to relational tables and their columns
Markup is ignored on storage, and reconstructed on retrieval
advantage of this approach: the contents of an XML document can
be handled with traditional SQL
XML support approach(1) cont’
Shortcomings: The sequence information lost
<Order CustomerId=”567” Date=”12- 12-2000”><Item ProductID=” 17” Quantity=”2”/><Item ProductID=”l6” Quantity=”9”/><Item ProductID=“ 19 ” Quantity=“8”/>
</Order>
The retrieval of the order:
<Order CustomerId=”567” Date=”12-12-2000”><Item ProductID=” 16” Quantity=”9’/><Item ProductID=” 17” Quantity=”2”/><Item ProductID=” 19” Quantity=”8”/>
</Order>
XML support approach(1) cont’
Data-centric documents sequence might not matter, it does for document-centric
this approach loses all comments and processing instructions
mixed content cannot be stored easily in this model
XML support approach(2)
Leaves the XML document intact and stores it in a large text field (“BLOB”)
Or even outside the database Text search is possible Can limit a certain text-based
condition
XML support approach(2) cont’ Limitations:
no structure-aware combinations are possible
Value-based search is not supported on these text fields
IBM solution: side tables But, direct manipulation of side tables
destroys the consistency of the database Security can be defined on document
level only, but not on elements or attributes
Summary Tamino was designed with particular attention to
the XML Schema handling for XML is different from
relational databases does In Schema handling, external entities cause
conceptual problems value-based indexes are useful for XML, as well
as text index and structural index Comments and processing instructions should
be preserved when documents are stored The result of a query against an XML database
should be XML
Q&A
Thanks!