Top Banner
Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen
29
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Main challenges in XML/Relational mapping

Juha Sallinen

Hannes Tolvanen

Page 2: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Agenda

• Introduction: XML and databases

• Objectives of the study

• Findings

• Conclusions

Page 3: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Introduction: XML and databases

Page 4: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Basic definitions

• XML/relational mapping means data transformation between XML and relational data models

• Mapping method is the way the mapping is done

Page 5: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Native vs. Relational

• Why to store XML documents in relational database and not in native XML database?– Immaturity of current native XML database

technology– Emerging technology - no ”de facto” standard– Well-working relational databases currently in

use• Efficient and usable• May have been in use for years

Page 6: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Mapping dilemma

• XML data model supports much more flexible data structures than relational model

• Two fundamental differences:– XML tags– Nested structure of XML elements vs. flat

structure of relational tables• If an XML document is not originated from

another relational data source, it is possible that the data does not fit to relational schema very well

Page 7: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Dichotomy of mapping methods

• There are two fundamentally different techniques of storing XML documents in a relational database– LOB presentation– Composed presentation

Page 8: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

LOB presentation

• LOB stands for Large Object

• One XML document is put into a single column of a relational table

• At least one column for indexing is also needed

• Does not take full advantage of classical relational database (no XML extensions)– Not possible to use SQL to query XML

elements

• Not a very interesting choice!

Page 9: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Composed presentation

• Data structure of an XML document is ”shredded” over one or more tables

• Example: Different elements to different columns

• Multiple ways to do this– Table-based and object-relational mapping will

be introduced later

Page 10: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Objectives of the study

Page 11: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Objectives of the study

• Find and explain the main issues to be considered when converting XML schema to relational schema– In other words: The main challenges that have to

be taken into account by• Designers of XML/relational mapping methods• Users who need to map the data explicitly

• Find and describe briefly two general mapping methods based on composed presentation

Page 12: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Findings

Page 13: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Issues to consider in mapping

• Some of the most essential data characteristics– Existence of schema definition document– Stability of the schema– Degree of structure

• Usage model for data– Queries against the database– Requirement of preserving ”hidden” information

• DBMS implementation – not covered by the study, because scope was limited

to the classical relational model

Page 14: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Data characteristics: Existence of XML schema definition

• Schema definition says how the structure of XML documents conforming the schema is restricted – XSD (XML Schema Definition) and DTD (Document Type

Definition) are currently the dominating standards for defining XML schema.

• If we have the definition for the schema, conversion to relational schema will be based on it.

• If we don’t have the schema definition, we have to make guesses how the structure of the given XML vocabulary is restricted.– Guesses are based on the data of instances of the vocabulary

(XML documents). In other words we extract the schema from available data.

– This is not unproblematic as we see from next example

Page 15: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Data characteristics: Existence of XML schema definition 2 - Example

• Illustration of the problem of extracting the schema from data:<addressbook>

<personname>eddy example</personname>

<adddress>mannerheimintie 10, 00000 helsinki</address>

</addressbook>

• We might deduce from the document, that we wish to restrict the schema to <!ELEMENT addressbook (name, address)>

<!ELEMENT B (#PCDATA)> <!ELEMENT C (#PCDATA)>

Page 16: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Data characteristics: Existence of XML schema definition 2 –

Example continued• But if following document is received from the data source, we either have

to extend our relational schema or dismiss the data that relational schema doesn’t support (summer cottage’s address) or combine the two fields:<addressbook>

<personname>person2</personname><address>jämeräntaival 10, 02150 espoo</address><summerCottageAddress>hiekkatie 7, 99999 oulu</summerCottageAddress>

</addressbook>

• We can alter the database schema by adding an extra column to table mapped from addressbook element to support the the new information

– This solution can’t be however applied if we don’t know the relation between person and summercottage is 1:1. We might get documents containing persons that have many addresses for summer cottages, and again, we would run to the situation that we would have to alter the database schema. We would have to create a property table for the addresses.

Page 17: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Evolving schema

• If the schema of XML vocabulary is defined, but it experiences changes, respective changes must be made to relational schema

• Changes are not always such easy to make to relational schema as in previous example (if composed approach is used)

• It should be evaluated what are the chances for schema to change.

Page 18: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Degree of structure of the XML schema

• Categorization used in the study:1. Structured data

• Data is totally independent from the presentation used to describe it.

• Document can be navigated without examining it first

2. Semi-structured data• Some blocks of the document may contain optionalities

3. Marked-up text• Documents require the preservation of ”hidden” information• E.g. HTML documents

• These terms have different meaning in the literature. Information on the following slide is based on the definitions of this slide.

Page 19: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Degree of structure of the XML schema

• Structured documents can be easily mapped to database using composed presentation. Also semi-structured documents can be decomposed if schema definition is provided. If mixed content is included, it depends on the usage of data whether LOB presentation is better for the mixed content block than further fragmentation.

• Marked-up text's requirement for “hidden information's” preservation is discussed later.

Page 20: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Storing mixed content to relations

• Mixed content: Document elements embedded to character data . E.g.

<h1>example</h1><p>here you have a <b>short</b> example</p>

• Designing a relational schema to store mixed content– If there are blocks in the content that make sense only as a

whole, decomposition of those blocks makes no sense.– If we have strong arguments for decomposing a block containing

mixed content, one possible decomposition method is to create one table for the root element and one property table for character data, and a property table for every element that appears in the content.

Page 21: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Mixed content mapping example

• DTD <!ELEMENT A

(#PCDATA | B | C)*> <!ELEMENT B

(#PCDATA)> <!ELEMENT C

(#PCDATA)>

• Example instance:Here we have a <b>nice

</b> example <c>!</c>

• Relational schema– A(a_pk)– B(a_fk,b, bOrder)– C(a_fk, c, cOrder)– PCDATA(a_fk, pcdata,

pcdataOrder)

Page 22: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Usage models for data: Type of queries executed against the

database

• The spectrum of queries – Queries that retrieve XML documents– Queries that retrieve fragments of XML

documents– Queries that make transformations on XML

data– And even more complex queries...

Page 23: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Query examples 1• Sample documents <addressbook>

<personname>person1</personname>

<streetaddress>jämeräntaival 10</streetaddress>

<city>espoo</city>

<summerCottageAddress>hiekkatie 7, 99999 oulu</summerCottageAddress></addressbook>

<addressbook>

<personname>person2</personname>

<streetaddress>smt 10</streetaddress>

<city>espoo</city>

<summerCottageAddress>hiekkatie 7, 99999 oulu</summerCottageAddress></addressbook>

• Query emitting XML fragment: Select the names of persons who live in Espoo

<personname>person1</personname>

<personname>person2</personname>

Page 24: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Query examples 2

• Query making transformation: “select the number of persons living in Espoo”

<numberOfPersonsInEspoo>2</numberOfPersonsInEspoo>

Page 25: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Preservation of “hidden” information

• The XML document contains “hidden” information that is related to the presentation of the data, not the data itself.

– Order of elements

– Comments

– Whitespaces

• It might be required that original XML documents can be retrieved

– Trivial when LOB presentation is used

– If composition presentation is used, all “hidden” information need to be stored to relations

Page 26: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Table-based mapping<Tables>

<Table_1>

<Row>

<Column_1>...</Column_1>

...

<Column_n>...</Column_n>

</Row>

...

</Table_1>

...

<Table_n>

<Row>

<Column_1>...</Column_1>

...

<Column_m>...</Column_m>

</Row>

...

</Table_n>

</Tables>

• Listing 1. Required structure of XML document in table-based mapping (Bourret, 2001).

Page 27: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Object-Relational mapping

• Mapping method for mapping any XML document that has a schema definition.

• The idea is to convert the schema of document to an object schema, and then convert the object schema to relational schema

• Step of object/relational conversion is predefined, but XML/object conversion leaves some freedom to define the object view that is mapped from XML data.

Page 28: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Conclusions

Page 29: Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.

Conclusions

• The selection between the choice of possible relational representations for XML data include many issues that must be considered.

• Some of the issues limit the choice to LOB presentation (no schema, rapidly evolving schema, queries include only retrieval of original documents)

• LOB presentation can be also used for storing blocks of the document where are no references from elsewhere.

• Usual reason why decomposition method is generally preferred if possible, is the performance gain. Also the data comes more accessible to applications that use the database, but don’t publish any views of data in XML.