XML To Relational Model
Key Index – Forward Traversal
Backward Traversal
Binary Approach
Bname(source, ordinal, flag, target) Create many tables as different
subelement and attribute names occur in XML document
Partition Edge Table by name
Universal table – Take outer join of all binary tables
Universal Table with Overflow
Converting Ordered XML to Relations
Skynet Hitech. Company
<Company><Name>
Skynet Hitech</Name><Department>
<Name>Research
</Name><Manager>
John Smith</Manager><Employee>
Tom Jackson</Employee>
</Department>
<Department><Name>
Sales</Name><Manager>
Linda White</Manager><Employee>
Kevin Lee </Employee></Department>
</Company>
Ordered XML model for Skynet Hitech. Company
Company
Name Department
Skynet Hitech Name Manager Employee
Research John Smith Tom Jackson
Department
Name Manager Employee
Sales Linda White Kevin Lee
1
1 2 3
1 2 3 1 2 3
Schema of the storing table
Attributes IDID: the unique index for each tuple DID: the document ID Path: the path from the root to the leaf node,
this is to find a particular node Surrogate Pattern: number representation of
nodes Value: Text value associated with each node
Numbering nodes
Company
Name Department
Skynet Hitech Name Manager Employee
Research John Smith Tom Jackson
Department
Name Manager Employee
Sales Linda White Kevin Lee
1[1]
2[2]
2[1]
Tuple that stores “Linda White”
ID: 00334 DID: 501 Path: Company/Department/Manager Surrogate Pattern: 1[1]2[2]2[1] Value: Linda White
Old Skynet file stored in the RDBMS
OLD
Path Surrogate Patten Value
Company/Name 1[1]1[1] Skynet Hitech
Company/Department/Name 1[1]2[1]1[1] Research
Company/Department/Manager 1[1]2[1]2[1] John Smith
Company/Department/Employee 1[1]2[1]3[1] Tom Jackson
Company/Department/Name 1[1]2[2]1[1] Sales
Company/Department/Manager 1[1]2[2]2[1] Linda White
Company/Department/Employee 1[1]2[2]3[1] Kevin Lee
book
booktitle
author
monograph
title
contactauthor
authorID
editor
*
nameaddress
?
firstname lastname
?
authorid
article
*
name
<!ELEMENT book (booktitle, author)
<!ELEMENT booktitle (#PCDATA)>
<!ELEMENT author (name, address)><!ATTLIST author id ID #REQUIRED>
<!ELEMENT name (firstname?, lastname)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT address ANY>
<!ELEMENT article (title, author*, contactauthor)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT contactauthor EMPTY><!ATTLIST contactauthor authorID IDREF IMPLIED>
<!ELEMENT monograph (title, author, editor)>
<!ELEMENT editor (monograph*)><!ATTLIST editor name CDATA #REQUIRED>
Basic Inline Algorithm
A relation is created for root of element of graph
All element’s descendents are inlined into that relation except Children below a “*” node are made into
separate relations – this corresponds to creating a new relation for a set-valued child
Each node having a backpointer edge pointing to it is made into a separate relation
Drawbacks
Grossly inefficient for many queries “List all authors having first name Jack” will have to
be executed as the union of 5 separate queries Large number of relations it creates
To determine the set of relations to be created for an element, we construct an element graph by… Do a DFS traversal of DTD graph, starting at element
node for which we are constructing relations Each node is marked as “visited” the first time it is
reached and is unmarked once all its children have been traversed
If an unmarked node in DTD graph is reach during DFS, a new node bearing the same name is created in the element graph
A regular edge is created from the most recently created node in the element graph with the same names as the DFS parent of the current DTD node to newly created node
If an attempt is made to traverse an already marked DTD, then a backpointer edge is added from the most recently created node in the element graph to the most recently created node in the element graph of the same name as the marked DTD node
Fragmentation: Example
Results in 5 relations Just retrieving first and last names of an
author requires three joins!
<!ELEMENT author (name, address)><!ATTLIST author id ID #REQUIRED>
<!ELEMENT name (firstname?, lastname)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT address ANY>
author (authorID: integer, id: string)
name (nameID: integer, authorID: integer)
firstname (firstnameID: integer, nameID: integer, value: string)
lastname (lastnameID: integer, nameID: integer, value: string)
address (addressID: integer, authorID: integer, value: string)
Shared Inlining Method
Relations are created for… All elements in the DTD graph whose nodes have an
in-degree greater than one. Nodes with in-degree of one are inlined
Elements have an in-degree of zero Elements below a “*” node Of mutually recursive elements all having in-degree
one, one of them is made a separate relation Each element node X that is a separate relation inlines
all nodes Y that are reachable from it such that the path from X to Y does not contain a node that is to be made a separate relation
Issues with Sharing Elements
Parent of elements not fixed at schema level
Need to store type and ids of parents parentCODE field (type of parent) parentID field (id of parent) No foreign key relationship
Hybrid
Same as Shared except that it inlines some elements not inlined in Shared Inlines elements with in-degreee greater than
one that are not recursive or reached through a “*” node.
Set sub-elements and recursive elements are treated as in Shared
book (bookID: integer, book.booktitle.isroot: boolean, book.booktitle : string)
article (articleID: integer, article.contactauthor.isroot: boolean, article.contactauthor.authorid: string)
monograph (monographID: integer, monograph.parentID: integer, monograph.parentCODE: integer, monograph.editor.isroot: boolean, monograph.editor.name: string)
title (titleID: integer, title.parentID: integer, title.parentCODE: integer, title: string)
author (authorID: integer, author.parentID: integer, author.parentCODE: integer, author.name.isroot: boolean, author.name.firstname.isroot: :boolean, author.name.firstname: string, author.name.lastname.isroot: boolean, author.name.lastname: string, author.address.isroot: boolean, author.address: string, author.authorid: string)
Shared Inline
Hybrid