COMPARATIVE STUDY OF NATIVE XML AND XED MAPPING INTO RELATIONAL DATABASE

COMPARATIVE STUDY OF NATIVE XML AND XED MAPPING INTO RELATIONAL DATABSEBy , Bharat Nagalia (10503859) and Somil Tyagi (10503867)

CHAPTER 1

INTRODUCTION

• XML is an emerging standard for the representation and exchange data over the internet. As such, XML tags allow to describe the meaning of the content itself. New tags and attribute names can be defined, document structures can be nested to many levels of complexity and documents can be associated with a type specification called document type definition(DTD).

• Relational databases are particularly good for storing and querying highly structured information .As a query language, SQL is designed specifically to query structured data.

• In addition, RDBMS stores data efficiently and with no redundancy because each unit of information is saved at only one place. RDBMS are also known for their reliability and scalability and such systems can be accessed by a very large number of concurrent users.

• Relational systems were never designed to handle semi structured content often stored as XML. Semi-structured is often explained as schema less or self-describing terms that indicate that there is no separate description of the type or structure of data .

• Semi-structured content is difficult to store in relational database since it does not map easily to the row-and-column structure of a RDBMS. Hence there exists a need for a technique to transform data from XML to Relational tables.

• The entire process of mapping XML is divided into different phases as follows:-

• PARSING THE XML DOCUMENT

• MAPPING THE XML DOCUMENT INTO DATABASE

• QUERY IMPLIMENTATION ON THE DATABASE

1.1 PARSING THE XML DOCUMENT

• In an XML document the metadata is removed by parsing the XML document• Mainly two parsers are used to parse an XML document • • DOM PARSER• DOM so called (Domain object model) is the most commonly used and easiest implementation to parse an

xml in java. DOM loads the xml tree into memory before parsing it, this is the reason we need to have a large heap size to eliminate memory exceptions. In case we have a large xml to parse it is better to use SAX instead of DOM, because loading a large xml in memory is not a good choice. DOM is already a part of JDK, hence we don’t need to add any external jar to start with DOM.

• • SAX PARSER• SAX Parser is different from the DOM Parser where SAX parser doesn’t load the complete XML into the

memory, instead it parses the XML line by line triggering different events as and when it encounters different elements like: opening tag, closing tag, character data, comments and so on. This is the reason why SAX Parser is called an event based parser.

• Along with the XML source file, we also register a handler which extends the DefaultHandler class. The DefaultHandler class provides different callbacks out of which we would be interested in:

• startElement() – triggers this event when the start of the tag is encountered.• endElement() – triggers this event when the end of the tag is encountered.• characters() – triggers this event when it encounters some text data.

http://en.wikipedia.org/wiki/Simple_API_for_XML

http://docs.oracle.com/javase/6/docs/api/org/xml/sax/helpers/DefaultHandler.html

1.2 MAPPING THE XML DOCUMENT INTO DATABASE

• LSDX Labelling• In order to facilitate query processing for XML data several path indexing, labelling and numbering scheme have

been proposed However, if XML data need to be updated frequently, most of these approaches will need to re-compute existing labels which is rather time consuming. In this project, we used a new Labelling Scheme for Dynamic XML data (LSDX) that supports the representation of the ancestor – descendant relationship and sibling relationship between nodes.

• Bridging

• After labelling each element in the xml code the elements are represented into the database with their unique codes

1.3 QUERY IMPLIMENTATION ON THE DATABASE

• Queries are implemented on the relational database to generate results from the stored data. The result computation time decides the efficiency of a particular database.

• Mainly two languages are used to retrieve data from an relational database

• XQUERY

• XPATH

CHAPTER 2

LITERATURE AND SURVEY2.1 SUMMARY OF SOURCES

• EXISTING TOOLS & TECHNOLOGIES:

• Eclipse:

• Oxygen XML Editor

• Sedna

• Xml validator

• Tizag XML Tutorials

• BOOKS:

• XML Data Management Native XML and XML- Enabled Database Systems by Akmal B. Chaudhri : The books gives a detailed information of native XML and XED . Description of different parsing as well as mapping techniques is given.

• CONFERENCES:

• TriKonf 2013 - First Tri-National Translation Conference held in Germany on 18th-20 october , 2013

• XML Summer School 2013 held in England , Oxford city on 15th-20th September, 2013

• Balisage The Markup Conference held in Canada on 6th – 9th august , 2013

• BaseX Users Meetup, XML Prague held in Prague , Czech Republic on 8th Feburary , 2013

• XML Amsterdam 2012 Connecting XML developers worldwide held in Amsterdam in Netherlands on 19th September, 2013

2.2 SUMMARY OF TABLES

Title of Paper CPI:Constraints-Preserving Inlining Algorithm for Mapping XML DTD to Relational Schema

Authors Dongwon Lee, Wesley W.Chu

Year of Publication 6 JULY 2001

Summary This paper presents a method to transform XML DTD to relational schema both in structural and semantic aspects. After discussing the semantic constraints hidden DTD’s, two algorithms are presented for: 1) Discovering the semantic constraints using the hybrid Inlining algorithm, and 2) Rewriting the semantic constraints in relational notation

Publishing Details Department of Computer Science, University of California at Los Angeles, Los Angeles, CA 90095, USAData & Knowledge Engineering 08/2001; DOI:10.1016/S0169-023X(01)00028-3Source: DBLP

Web Link www.cobase.cs.ucla.edu/tech-docs/dongwon/cpi-dke.pdf

http://dblp.uni-trier.de/db/journals/dke/dke39.html#LeeC01

Title of Paper A STUDY OF NATIVE XML DATABASESDocument Update, Querying, Access Control and ApplicationProgramming Interfaces in Native XML Databases

Authors M. Mercedes Mart´ınez-González, Miguel A. Mart´ınez-Prieto and Mar´ıa Muñoz-Nieto

Year of Publication March 2009

Summary The current situation is that there is a standard query language, XQuery, and a standard API, the XML:DB API, but no standards for updates and access control. This situation generates a dependence of software applications on the NXD used, which is clearly a serious drawback. It seems quite reasonable that almost any application needs to update the uderlying database, and access control is a common need as well.

Publishing Details WEBIST 2009 - 5th International Conference on Web Information Systems and Technologies.

Web Link dataweb.infor.uva.es/wp-content/uploads/2012/02/webist09.pdf

Title of Paper LSDX: A New Labelling Scheme for Dynamically Updating XML Data

Authors Maggie Duong and Yanchun Zhang

Year of Publication 2011

Summary In order to facilitate query processing for XML data,several path indexing, labelling and numbering scheme have been proposed. However, if XML data need to be updated frequently, most of these approaches will need to re-compute existing labels which is rather time consuming. In this paper, we propose a new Labelling Scheme for Dynamic XML data (LSDX) that supports the representation of the ancestor – descendant relationship and sibling relationship between nodes.

Publishing Details School of Computer Science and Mathematics Victoria University

Web Link crpit.com/confpapers/CRPITV39Duong.pdf

CHAPTER 3

RESULTS OF LITERATURE SURVEY3.1 INTEGRATED SUMMARY OF LITERATURE

• XML mapping is the process of mapping a XML databse into relational database(tables), so that data can be organised and fast (in processing queries).

• XML is subdivided into NATIVE XML and XED based upon the schema of the database.

• The process of XML mapping includes 3 phases:

• • PARSING THE XML DOCUMENT• MAPPING THE XML DOCUMENT INTO DATABASE

• XML into tree• Tree into table

• QUERY IMPLIMENTATION ON THE DATABASE

• The provision of different storage options provides the developer and the designer with a possibility to choose the suitable storage option depending on the structure and the size of the XML documents that needs to be stored and the type of queries that should be managed by the XML DB.

DETAILED WORKING OF AN SAX PARSER

3.2 SOME RELEVANT CURRENT/OPEN PROBLEMS

• Mapping Composite Keys and Composite Foreign Keys. Currently mapping approaches are only for simple keys and simple foreign keys, not for composite keys and composite foreign.

• Mapping General Constraints. General constraints are advanced constraints that they provide more flexibility to user to decide which operations should be executed to maintain consistency of relational data. By far, how to map general constraints is an still open problem and there is no suitable approach to solve it as the best of our knowledge.

• How to provide a maintenance mechanism for integrity constraints in the mapped XML document is still an problem, e.g., mapping options for enforcing referential integrity constraints. In a relational database, if a referential integrity constraint is defined, any command executing on the relational database should be evaluated first. If the evaluation is false, the command will be rejected. However, all above approaches did not consider the enforcement of referential integrity constraints in the mapped XML document.

3.3 PROBLEM STATEMENT

• We intend to compare the two types of XML databases that are XML enabled and Native XML database

• System must be able to take an input of a random XML file and process it to map it to any Relational database from the above given type

• Certain factors such as the query execution time on the following databases will be the deciding factor which database is better

3.4 PROPOSED APPROACH

• The proposed approach of this project can be stated in the following steps:

• Input a XML file

• Differentiate it on the basis of its data representation i.e. between NATIVE XML and XED.

• Apply the particular SAX parser approach for the XML file.

• SAX parser differentiates the data into startelement, endelement and characters.

• Further store these values into a tree using ranking/labelling algorithm, so that parent-child relation can be established.

• At each point store the compile time, as it differs with the size and type of XML file.

• Applying BFS or DFS on the parsed tree.

• Then finally inputting the nodes in the database tables and applying XPATH.

CHAPTER 4

ANALYSIS, DESIGN AND MODELING

4.1 OVERALL DESCRIPTION OF PROJECT

• The project processes a random XML document.

• The processing part of the project includes firstly parsing of the XML document using SAX parser. Further the parsed XML document is processed by an appropriate XML to database mapping technique

• The mapping of XML is done into two types of database XML enabled and Native XML database

• Lastly query implementation is done on both the types of databases using XQuery/XPath and the efficiency with which the query is resolved will decide which database type is more efficient

• Constraints:• SAX parser is that the user who implements the “ProcessData” method has no control over the sequence in

which different elements are processed. The processing sequence depends only on the sequence within the XML file.

• Assumptions:• It is assumed there is no syntax error in the input XML file

4.2 FUNCTIONAL & NON-FUNCTIONAL REQUIREMENTS

•Functional Requirements:

• The system shall be able to input a random XML and parse it.

• The system shall be able to distribute parsed XML document into a tree form

• The system shall be able to map the contents of a XML tree into a relational database

• The system shall be able map the tree in both XML enabled database as well as native XML database

• The system shall be able to generate output on basis of the queries implemented

•Non -Functional Requirements:

• System Requirements – The same as required Eclipse software

• Security – The system has no security constraints.

• Performance - The response time depends on the size of the XML file

• Maintainability – System is easy to maintain

• Portability – Java, the language the system employs, is platform independent.

• Reliability - The system shall be able to provide a good level of precision.

USE CASE DIAGRAM

4.3 OVERALL ARCHITECTURE

• The architecture consists of four parts. Part one maps XML document into RDB, part two reconstruct XML document from RDB, part three translate users XQuery queries into SQL statements, and part four translate SQL statements result into XML format.

• In part one, the system loads the XML document and Parses it by XML SAX parser shreds the document content Into tokens, and stores these tokens into predefined relational Schema

• While part two of the system goes through the relational Tables and reconstructs the XML document. It gives the facility for the user to insert, delete, and update the content of the document and store it again to the database.

• In part three, the user XQuery queries are translated to SQL statements and fired against the database engine to get the results. And these results are translated from relational table format to XML format hierarchical format and return back to the user.

SYSTEM ARCHITECTURE

4.4 PROPOSED ALGORITHM

• STEP 1: Choose a XML File to be mapped into the relational database

• STEP 2: Parse the XML file using SAX parser to remove metadata.

• STEP 3: Label the nodes of the tree using LSDX.

• STEP 4: Map the nodes into a relational database in MYSQL.

• STEP 5 : Implement query using XPath/XQuery on the relational database.

GRAPHS GENERATED

DATABASE

APPENDIXNEW TOOLS & TECHNOLOGIES USED

• Eclipse

• SAX parser

• JAXP

• JDBC

• MySQL WorkBench (SQL server)

REFERNCES •WEBLINKS:• www.saxproject.org/apidoc/org/xml/sax/Parser.html

• www.cs.nmsu.edu/~epontell/courses/XML/material/xmlparsers.html

• http://xmldb-org.sourceforge.net/faqs.html

• http://www.rpbourret.com/xml/XMLAndDatabases.htm#datavdocs

• BOOKS:• XML Data Management Native XML and XML- Enabled Database Systems by Akmal B. Chaudhri

• RESARCH PAPERS:• A STUDY OF NATIVE XML DATABASES - Document Update, Querying, Access Control and

Application Programming Interfaces in Native XML Databases by M. Mercedes Mart´ınez-González, Miguel A. Mart´ınez-Prieto and Mar´ıa Muñoz-Nieto, March 2009

• A Comparative Study Between Two Types of Database Management Systems: XML-Enabled Relational and Native XML by Amin Y. Noaman and Amal Abdullah Al Mansour, JULY 2012

• A Model Mapping Approach for storing XML documents in Relational databases by Dr. Mrs. Pushpa Suri, Divyesh Sharma, May 2012

• Storing and Querying Ordered XML Using a Relational Database System by Igor Tatarinov*, Stratis D. Viglas*, Eugene Shekita, Chun Zhang*, Jayavel Shanmugasundaram*, Kevin Beyer, JUNE 2002

COMPARATIVE STUDY OF NATIVE XML AND XED MAPPING INTO RELATIONAL DATABASE

Engineering

large xml

xml prague

xml line

xml amsterdam

xml tags

xml developers

complete xml

introduction xml