Top Banner
Habiba Skalli ID: 001C543860 XML and Data Management For: Dr. Haddouti Fall 2003 Storing XML using Relational Model Outline: 1. Introduction: a. Why Store XML . b. Where Store XML? 2. XML Storage Requirements. 3. Strategies of Storing XML. 4. Mapping XML to Relational Model. 5. XML enabled databases: a. Microsoft SQL Server 2000. b. Oracle 8i/9i. e. c. IBM DB2. 6. Conclusion. 7. References. 1. Introduction : XML is becoming the standard for the exchange of information from Business to Business. And the amount of information exchanged using XML grows very fast. But why do we need to store XML? The answer to this question is: - XML documents are conceived as transitory form of data. - XML is not designed to facilitate efficient data retrieval and storage. - Processing and accessing data in large XML files is a time consuming process. 1
16
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Paper

Habiba SkalliID: 001C543860XML and Data ManagementFor: Dr. HaddoutiFall 2003

Storing XML using Relational Model

Outline:

1. Introduction:a. Why Store XML.Where Store XML?

2. XML Storage Requirements.3. Strategies of Storing XML.4. Mapping XML to Relational Model.5. XML enabled databases:

a. Microsoft SQL Server 2000.b. Oracle 8i/9i. IBM DB2.

6. Conclusion.7. References.

1. Introduction:

XML is becoming the standard for the exchange of information from Business to Business. And the amount of information exchanged using XML grows very fast. But why do we need to store XML?The answer to this question is:

- XML documents are conceived as transitory form of data. - XML is not designed to facilitate efficient data retrieval and storage. - Processing and accessing data in large XML files is a time consuming process. - XML data should be stored in a consistent and efficient manner to enforce data integrity. [1+5]

XML documents can be stored either into an RDBMS, an ODBMS or a Native XML Database.The focus of this paper is on the relational database model.

2. XML Storage Requirements:

XML documents are either: Data Centric, Document Centric or Semi structured.

a. a. Data Centric Documents:

An XML document is data centric if it has a regular structure and contains updateable data (e.g. invoice documents, purchase orders, flight schedules).

1

Page 2: Paper

“Traditional relational databases are typically better at dealing with data centric requirements.”The technique that is appropriate for storing data centric XML documents is mapping the structure of the XML document to the database: store the XML document contents into relational tables. [2]An Example of data centric document:

<order> <customer>Meyer</customer> <position> <isbn>1-234-56789-0</isbn> <number>2</number> <price currency=„Euro“>30.00</price> </position></order> [17]

b. Document Centric Documents:

Document centric data tends to be more unpredictable in size and content (e.g. newspaper content, articles, and advertisements).Native XML databases and content management systems are typically better at storing document centric data.A content management system is an application designed to manage documents and it can be built on top of a native XML database. [2]

c. Semi Structured Documents:

The difference between data centric and document centric documents is not always easy to identify. Semi structured documents are combination of data centric and document centric. [2]

An example of semi structured document:

<movie> <title>Insomnia</title> <year>2002</year>

<company>Warner Bros</company>

<description>Sent to a small Alaska town to investigate the murder of a teenage girl, a veteran police officer (Al Pacino) is forced…

</ description ></movie>

2

Page 3: Paper

3. Strategies for Storing XML:

The storage of XML can be achieved in three different ways: structure based storing, model based storing and text based storing.

a. Structure Based Storing (e.g. STORED, POET):

In structure based storing, the database schemas represent the logical structure of the XML document (or DTDs if they are available). Therefore, a relation or a class is created for each element type in the XML documents. [14]Take the example of STORED (Semi-structured TO Relational Data). STORED is a query language that does the mapping between XML (semi-structured data) and relational data. This mapping is generated automatically using data mining techniques.Mapping with STORED is lossless because an overflow graph is used to store the data that does not fit into the schema. After the STORED process is finished, the system becomes ready for insert and update in the database. [19]

b. Model Based Storing (e.g. Excelon’s Xis, Infonyte):

In model based storing, a fixed database schema is used to store the structure of all XML documents. [14]Take the example of Xis. Xis is a native XML database that stores XML data directly as Document Object Model (DOM) trees. It makes changes in the structure and data of XML documents dynamically and in real time by processing only those XML elements or sub-elements needed for a particular business process or transaction. [20]

c. Text Based Storing:

Text based storing / method 1:

Store the entire document as text, such as a Binary Large OBject (BLOB) into the relational database. This strategy is appropriate for document centric data. All leading RDBMS vendors support storing the entire document (Microsoft SQL Server 2000, Oracle8i, IBM DB2). This method is simple but it does not allow flexible indexing and searching of data. [2]

Text Based storing / method 2:

Store the entire document in the file system with a pointer to that file stored in the database. This method is useful if the number of XML documents is small and infrequently updated and it is supported by leading database vendors. But it has problems: (1) It is not very flexible for storing and retrieving data (2) has the problem of security since the files are stored outside the database. [2]

3

Page 4: Paper

4. Mapping XML to Relational Model:

The strategy that is the most used for storing XML into the Relational BD is mapping the XML document structure to the database which means that the XML data is stored into relational tables (called side tables). This method allows easy search, update and retrieval of the data stored in the database.The problem here is how to do the mapping between and the database knowing that the relational database model is based on primary keys, foreign keys, tables, rows and columns. Obviously, XML is built on a different principle.

The differences between XML and relational databases can be summarised as follows:

[1]

Therefore the mapping between XML and relational model can be achieved as follows:

[2]

Mapping Example: [15]

<SalesOrder> <Number>1234</Number> <Customer>Gallagher Industries</Customer> <Date>29.10.00</Date> <Item Number="1">

<Part>A-10</Part> <Quantity>12</Quantity> <Price>10.95</Price>

</Item> <Item Number="2">

<Part>B-43</Part> <Quantity>600</Quantity> <Price>3.99</Price>

</Item> </SalesOrder> This XML document can be mapped into the following tables:

SaleOrders ---------- Number Customer Date

4

Page 5: Paper

----------- --------------------------- --------------1234 Gallagher Industries 29.10.00 ... ... ... ... ... ... Items -------- SONumber Item Part Quantity Price --------------- ------ ------ ----------- ------- 1234 1 A-10 12 10.95

1234 2 B-43 600 3.99 ... ... ... ... ...

Problems with mapping:

There are many problems associated with the mapping between semi-structured documents and relational tables. The first one is that the mapping between XML and the DB is not done in a natural way. Second, the XML documents have to be cut into pieces and inserted into the relational tables. Third to query the database, tables should be joined, and with large tables, this process becomes time consuming. And finally, it takes time to restore the entire XML document. [18]The solution for these problems associated with relational model is to use Native XML databases.

5. XML Enabled Databases:

Various Database vendors have developed efficient solutions for automatic conversions of XML into and out of relational databases: Microsoft's SQL Server 2000, Oracle 8i, Oracle 9i and IBM DB2.The tools and techniques offered to achieve retrieval and storage of the XML data vary.

a. Microsoft SQL Server 2000:

SQL server 2000 introduced many features to deal with the storage and retrieval of XML documents:

- SELECT statement result sets can be mapped into XML documents by using FOR XML keyword. - XML documents can be stored into the database using OPENXML. [2]

Retrieving XML documents from the database using FOR XML:

In General, the SQL code that generates an XML document from the database is:

SELECT select_listFROM table_sourceWHERE search_conditionFOR XML AUTO | RAW | EXPLICIT [, XMLDATA] [, ELEMENTS] [, BINARY BASE64]

FOR XML – example: [12]

Assume we have these two tables in the SQL database:

5

Page 6: Paper

Orders:

Order Details:

Using this SELECT statement we will construct an XML document that will contain information about an order (create an invoice):

SELECT Invoice.OrderID InvoiceNo, OrderDate, ProductID, UnitPrice Price, QuantityFROM Orders Invoice JOIN [Order Details] ItemON Invoice.OrderID = Item.OrderIDWHERE Invoice.OrderID = 10248FOR XML AUTO, ELEMENTS

The SELECT statement gives the following result:

<Invoice>    <InvoiceNo>10248</InvoiceNo>    <OrderDate>1996-07-04T00:00:00</OrderDate>    <Item>        <ProductID>11</ProductID>        <Price>14</Price>        <Quantity>12</Quantity>    </Item>    <Item>         <ProductID>42</ProductID>        <Price>9.8</Price>        <Quantity>10</Quantity>    </Item>    <Item>        <ProductID>72</ProductID>        <Price>34.8</Price>        <Quantity>5</Quantity>    </Item></Invoice>

- AUTO allows you to rename the attributes in the XML document. The default is the name of the table.-Queries executed using RAW mode return an XML element for each row in the resulting rowset.

6

Page 7: Paper

- EXPLICIT is used to produce both elements and attributes.- XMLDATA is optional and used only with EXPLICIT.- ELEMENTS instructs the creation of sub-elements instead of attributes.- BINARY is used to extract binary data (such as an image) in the XML document. [12] Storing XML documents into the database using OPENXML:

The storing is done in three steps:

1. Compiling the XML document into internal DOM representation to obtain an “XML document handler” using the stored procedure sp_xml_preparedocument.2. Creating a database schema with OPENXML.3. Removing the compiled XML document from memory using the stored procedure sp_xml_removedocument. [3]OPENXML – Example: [9]

DECLARE @idoc intDECLARE @doc varchar(1000)SET @doc =‘<ROOT><Customer CustomerID="VINET" ContactName="Paul Henriot">

<Order CustomerID="VINET" EmployeeID="5" OrderDate="1996-07-04T00:00:00">

<OrderDetail OrderID="10248" ProductID="11" Quantity="12"/> <OrderDetail OrderID="10248" ProductID="42" Quantity="10"/>

</Order></Customer><Customer CustomerID="LILAS" ContactName="Carlos Gonzlez">

..</Customer>

<ROOT>'--Create an internal representation of the XML document.

EXEC sp_xml_preparedocument @idoc OUTPUT, @doc-- Execute a SELECT statement that uses the OPENXML rowset provider.

SELECT *FROM OPENXML (@idoc, '/ROOT/Customer',1) WITH (CustomerID varchar(10), ContactName varchar(20))

EXEC sp_xml_removedocument @idoc

The result set is:

CustomerID ContactName --------------- ----------------- VINET Paul Henriot LILAS Carlos Gonzlez Storing XML documents into the SQL database uses OPENXML (for insert and update).OPENXML is an extension to transact-SQL that provides a rowset.

7

Page 8: Paper

The syntax:OPENXML(idoc int [in], rowpattern nvarchar[in], [flags byte[in]]) [WITH (SchemaDeclaration | TableName)]

- idoc is the document handle of the internal representation of an XML document created by calling sp_xml_preparedocument.- rowpattern is the XPath pattern used to identify the nodes.- flags Indicates the mapping that should be used between the XML data and the relational rowset. [9]

SQL: mapping XML schema into DB schema: One way to map an XML schema to a relational schema (and the opposite) is to transform the XML schema into a mapping schema.It is the mapping schema that allows the mapping of elements and attributes to tables and columns and the retrieval of relational data as XML documents. XML View Mapper utility is a utility for SQL Server 2000 that allows the creation of an XDR (XML Data Reduced) schema from relational tables. it also allows the creation of a mapping schema corresponding to the XML schema. XML View Mapper also generates XDR schema from an XML document or create schema from DTD. [21, 22]

NB: XDR schemas were created by Microsoft to be used with their products.

Example of mapping schema:

<?xml version="1.0" ?><Schema xmlns="urn:schemas-microsoft-com:xml-data"

xmlns:dt="urn:schemas-microsoft-com:datatypes" xmlns:sql="urn:schemas-microsoft-com:xml-sql"> 

<ElementType name="Employee“ sql:relation="Employees"> <AttributeType name="EmpID" /> <AttributeType name="FName" /> <AttributeType name="LName" />  <attribute type="EmpID" sql:field="EmployeeID" /> <attribute type="FName" sql:field="FirstName" /> <attribute type="LName" sql:field="LastName" />

</ElementType> </Schema>[21]

The bolded elements are what makes this schema a mapping schema.

b. Oracle 8i - 9i:

Oracle offers XML SQL utility (a set of Java classes) that: – models XML document elements as a collection of nested tables and allows insert, update and delete.– generates XML documents from SQL query results or a JDBC result set object. [8]

8

Page 9: Paper

[2]

Extracting XML documents from the database:

We want to get the result set of the following SQL statement into an XML document:

SELECT Title, Author, Publisher, Year, ISBNFROM BookList WHERE BookID = 1234

XML utility can be used to generate a DTD based on the schema of the underlying table being queried.The following simple code allows the creation of an XML document that contains the result set of the query:

import java.sql.*;import java.math.*;import oracle.xml.sql.query.*;import oracle.jdbc.*;import oracle.jdbc.driver.*;public class read_samp1e{

public static void main(String args[]) throws SQLException{

String user = "scott/tiger";DriverManager.registerDriver(new oracle.jdbc.driver.OracleDriver());//init a JDBC connection by passing in the userConnection conn = DriverManager.getConnection("jdbc:oracle:oci8:"+user+" @");// init the OracleXMLQuery by using the initialized JDB connection and passing in "Booklist" as tabNameOracleXMLQuery qry = new OracleXMLQuery(conn,"select * from Booklist WHERE BookID = 1234");// get the XML document in the string format which allows us to print itString xmlString = qry.getXMLString();// print out the result to the screenSystem.out.println(" OUPUT IS:\n"+xmlString);// Close the JDBC connectionconn.close();

}}[13]

The query results in the creation of the following XML document:

9

Page 10: Paper

<?xml version="1.0"?><ROWSET>

<ROW id="1"><TITLE>Oracle 9i</TITLE><AUTHOR>Mike Wilson</AUTHOR><PUBLISHER>William Morrow and Co.</PUBLISHER><YEAR>1997</YEAR><ISBN>0688149251</ISBN>

</ROW></ROWSET>[13]

Storing XML documents into the database:

Storing the XML document into the database uses the following instructions:

//Init the OracleXMLSave class: OracleXMLSave sav = new OracleXMLSave(conn, Booklist);sav.insertXML(simpledoc.xml); [13]

Remark: XSU (XML SQL) utility does not allow the storage of attributes. These have to be transformed into elements. [3]

Oracle: mapping XML schema into DB schema:

The Oracle 9i Release 2 has a new feature called XML DB Repository. This repository allows storage of XML documents directly in Oracle9i Database provided an XML schema. Once the XML schema is registered, the XML-to-relational database mapping is done automatically. After storage of XML data into the RDB, you can restore an XML document with the same DOM representation of the original XML document. [23]

Oracle 9i – XML Type: [24]

Oracle9i Database implements a number of standards-based functions to query relational data and return XML documents: XMLType, XMLELEMENT…

XML or XMLType is a datatype to hold XML data. XMLType view contains the result of an SQL query in the form of an XML document.

Syntax:CREATE OR REPLACE VIEW STUDENT OF XMLType WITH …

c. IBM DB2: [11]

IBM DB2 offers DB2 XML Extender. It allows the storage of XML documents in two ways:

10

Page 11: Paper

– XML Column: allows the storage and retrieval of the entire XML document as a column data.– XML Collection: decomposes/composes the XML document into/from a collection of relational tables.

Document Access Definition:

DB2 XML Extender provides a mapping scheme called a Document Access Definition (DAD). DAD is a file that allows an XML document to be mapped into relational data using either XML columns or XML Collections using the DTDs.

A unique feature of DB2 is the ability to manage and index XML documents located within the file system, a single column, or spread across multiple tables and columns.

6. Conclusion:

Storing XML into relational model is a huge and very important topic. We saw that there are many techniques to store XML documents among these are text based storage and the mapping between XML and relational model.Text based model is simple but it does not allow very flexible search and update. Mapping XML into relational model is the most popular way to store XML, it allows flexible manipulation and search of data but it has few problems due to the differences between the structure of XML and relational data. Despite these problems, I think that this method is the most appropriate for XML storage especially for data centric documents.Oracle 9i, Oracle 8i, Microsoft SQL server 2000 and IBM DB2 all support storage of XML documents. Oracle uses XML SQL Utility (XSU), SQL server uses OPENXML and FOR XML keywords in the SQL statements and DB2 uses DAD. I think the leading database is Oracle 9i that has added very important features to make easier to do the mapping.

7. References:

1. www.eaijournal.com/PDF/StoringXMLChampion.pdf2. www.acm.org/crossroads/xrds8-4/XML_RDBMS.html3. http://www.xml.com/pub/a/2001/06/20/databases.html4. http://www.w3.org/XML/RDB.html5. http://www.infoloom.com/gcaconfs/WEB/granada99/noe.HTM6. http://www.hitsw.com/products_services/whitepapers/integrating_xml_rdb/7. http://www.utdallas.edu/~lkhan/papers/APESXDRD_ProcACM3rdWIDM2001.pdf

11

Page 12: Paper

8. http://www.xml.com/pub/r/8469. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_oa-oz_5c89.asp10. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/xmlsql/ac_openxml_94mk.asp11. Db2XMLeXtender.pdf12. www.microsoft.com/mspress/books/sampchap/5178a.asp13. www.devx.com/assets/download/4513.pdf14. http://nike.psu.edu/classes/ist597/2003-fall/papers/TOIT2001-authorCopy.pdf15. http://www.xml.com/pub/a/2001/05/09/dtdtodbs.html?page=1 16. http://www.rpbourret.com/xml/ProdsNative.htm17. Course slides.18. www.csis.hku.hk/~dbgroup/seminar/wlian020308.ppt

19. http://nike.psu.edu/classes/ist597/2003-fall/ papers/deutsch98storing.pdf 20. http://www.nhm.ac.uk/science/rco/enhsin/ENHSIN_Caching.pdf

21. http://msdn.microsoft.com/library/default.asp?url=/library/en- us/xmlsql/ac_mschema_5cfn.asp

22. http://www.databasejournal.com/features/mssql/article.php/10894_2235451_2 23. http://otn.oracle.com/oramag/webcolumns/2003/ techarticles/scardina_xmldb.html 24. http://otn.oracle.com/oramag/oracle/03-may/o33xml.html

12