04-19-2001 1
Rainbow: Bridging XML and Relational Databases Design, Implementation, and Evaluation
MQP Advisor:MQP Advisor:Prof. Elke A. RundensteinerProf. Elke A. Rundensteiner
Sponsor:Sponsor:Verizon Laboratories IncorporatedVerizon Laboratories Incorporated
MQP Project Members:MQP Project Members:
Tien Vu, Tien Vu,
Mirek Cymer, Mirek Cymer,
John LeeJohn Lee
04-19-2001 2
HTML vs. XML
Microsoft, IBM, Informix, Oracle, Sun, ...Microsoft, IBM, Informix, Oracle, Sun, ...
04-19-2001 3
XML Data Management by RDBMS
Advantages:Advantages: Efficient query and analysis tools.Efficient query and analysis tools. Matured database tools available.Matured database tools available. Easy integration with existing business Easy integration with existing business
databases.databases. Issues:Issues:
Map between XML and Relational Model.Map between XML and Relational Model. Update Propagation.Update Propagation. Query Translation and Optimization.Query Translation and Optimization.
04-19-2001 4
Motivation for Mapping
Query Performance vary with respect to how data is mapped.Query Performance vary with respect to how data is mapped. Flexible mapping: fixed translation and restructureFlexible mapping: fixed translation and restructure
<EMPTY>
Mustang
2001
Ford
car
make
model
year
carmake model year
Ford Mustang 2001
Alternate Mapping
04-19-2001 5
XMLXMLDataData
SubSubsystemsystem
LegendLegend
Rainbow Architecture
DTDDTD XMLXML
XMLXMLQueryQuery
XMLXMLUserUser
XML Query EngineXML Query Engine
DTDM ManagerDTDM Manager XML ManagerXML Manager
Restructuring SubsystemRestructuring Subsystem RDBMS
04-19-2001 6
Goals of our MPQ
What:What: ImplementImplement and and evaluateevaluate restructuring subsystems restructuring subsystems
within the large-scale Rainbow system.within the large-scale Rainbow system. How:How:
Learn about the database technologies and web tools.Learn about the database technologies and web tools. Translate research ideas to software system design.Translate research ideas to software system design. Practice software engineering techniques: Practice software engineering techniques:
UML, engineer and reuse code.UML, engineer and reuse code. Design an experimental test plan and test bed.Design an experimental test plan and test bed. Conduct performance study and analysis.Conduct performance study and analysis.
04-19-2001 7
Restructuring Subsystem
DTDDTD XMLXML
XMLXMLQueryQuery
XMLXMLUserUser
XML Query EngineXML Query Engine
DTDM ManagerDTDM Manager XML ManagerXML Manager
Res
truc
turi
ngR
estr
uctu
ring
MappingMapping
RestructureRestructureOperatorOperatorLibraryLibrary
RestructurerRestructurer
Query StorageQuery Storage
XMLXMLModelModel
SubSubsystemsystem
RelationalRelationalModelModel
InternalInternalProcessProcess
LegendLegend
04-19-2001 8
Restructuring Operators
11 Restructuring Operators:11 Restructuring Operators: Rename Item/AttributeRename Item/Attribute Switch NestingSwitch Nesting Pushup/Pushdown AttributePushup/Pushdown Attribute Pushup/Pushdown NestingPushup/Pushdown Nesting Split/Merge NestingSplit/Merge Nesting Reference/DereferenceReference/Dereference
04-19-2001 9
Mapping: Sequence of Restructure Operators
Mapping is modeled as a sequence of reversable Mapping is modeled as a sequence of reversable restructuring operators, Operator Name + Parameters.restructuring operators, Operator Name + Parameters.
For Example:For Example:
pushUpAttribute(‘account_number’, ‘value’, ‘invoice’, ‘account_number’);
pushUpAttribute(‘bill_period’, ‘value’, ‘invoice’, ‘bill_peroid’);
renameItem(‘invoice’, ‘summary’);
<empty>invoice
value valueaccount_num bill_period
summaryaccount_num bill_period
04-19-2001 10
SQLs for Push-Up Attributes
CREATE VIEW new.A (CREATE VIEW new.A (<all-columns>, a) AS, a) ASSELECT A.SELECT A.<all_columns>, B.b, B.bFROM old.A, old.BFROM old.A, old.BWHERE B.pid = A.iidWHERE B.pid = A.iid
CREATE VIEW new.B (CREATE VIEW new.B (<all-columns-but-b>) AS) ASSELECT B.SELECT B.<all-columns-but-b>FROM old.BFROM old.B
A
B
A
B
Push-up
b
a
04-19-2001 11
Example SQLs Inline: Inline: make.value into car as Attribute make.make.value into car as Attribute make. Mapping:Mapping:
pushUpAttribute(pushUpAttribute(‘account_number’, ‘value’, ‘invoice’, ‘account_number’, ‘value’, ‘invoice’, ‘account_number’‘account_number’););
SQL statements:SQL statements:CREATE VIEW new.invoice (iid, pid, account_number) CREATE VIEW new.invoice (iid, pid, account_number)
ASASSELECT SELECT invoice.iid, invoice.pid,
account_number.valueFROM old.invoice, old.account_numberFROM old.invoice, old.account_numberWHERE account_number.pid = invoice.iidWHERE account_number.pid = invoice.iid
CREATE VIEW new.account_number (iid, pid) ASCREATE VIEW new.account_number (iid, pid) ASSELECT SELECT account_number.iid, account_number.pidFROM old.account_numberFROM old.account_number
04-19-2001 12
Rainbow Implementation
Development ToolsDevelopment Tools Java: Visual Café2, Java: Visual Café2,
Javadocs, JAVA2Javadocs, JAVA2 Oracle 8i, XML 4J, Oracle 8i, XML 4J,
JDBC1.2, SQL QueriesJDBC1.2, SQL Queries Code FactsCode Facts
44 total system classes44 total system classes 17 classes of Rainbow17 classes of Rainbow 27 classes reused27 classes reused ? lines of system code? lines of system code ? lines of Rainbow code? lines of Rainbow code ? lines of code reused? lines of code reused
new
re-use
04-19-2001 13
Screen Shot
04-19-2001 14
Screen Shot
04-19-2001 15
Rainbow Test & Experimental Evaluation
Experimental SetupExperimental Setup Oracle 8iOracle 8i Windows NTWindows NT
DataData Created a DTDCreated a DTD Randomly generated XMLRandomly generated XML Hand translated queriesHand translated queries
FactorsFactors Type of queryType of query Number of operationsNumber of operations
04-19-2001 16
Query Performance Evaluation
Query Performance vs #Restructuring
0
0.05
0.1
0.15
0.2
0 5 10
# Operations
Que
ry P
erfo
rman
ce (
s)
pushUpAttribte
04-19-2001 17
Rainbow Conclusions Technical accomplishmentsTechnical accomplishments
Functional prototype systemFunctional prototype system Feasibility of Rainbow conceptsFeasibility of Rainbow concepts Automated test bed designedAutomated test bed designed Performance evaluations show that:Performance evaluations show that:
(Ideal) Moving up data on the embedded-relational-level (Ideal) Moving up data on the embedded-relational-level yields better query performance for Join queries.yields better query performance for Join queries.
Knowledge gainedKnowledge gained OOOO, Java, JDBC, SQL, RDBMS, XML, DTD, Java, JDBC, SQL, RDBMS, XML, DTD Teamwork & S/W Engineering & Software ReuseTeamwork & S/W Engineering & Software Reuse Logistics of setting up an experimentLogistics of setting up an experiment
Future workFuture work Experiment test plans and test beds to realize the full potential of Experiment test plans and test beds to realize the full potential of
the restructuring component.the restructuring component.
04-19-2001 18
Rainbow: XML and Relational Database Design, Implementation, and Evaluation
Project MembersProject Members::Tien Vu, Mirek Cymer, John LeeTien Vu, Mirek Cymer, John Lee
Advisor:Advisor:Elke A. RundensteinerElke A. Rundensteiner
Ph. D Student:Ph. D Student:Xin ZhangXin Zhang
Sponsor By:Sponsor By:Verizon Laboratories IncorporatedVerizon Laboratories Incorporated
Visit Rainbow at http://davis.wpi.edu/dsrg/TJM/Visit Rainbow at http://davis.wpi.edu/dsrg/TJM/
04-19-2001 19
Recycled!!!
04-19-2001 20
XML: The Future of the Web
Benefits:Benefits: Efficient query and Efficient query and
analysis tools.analysis tools. Matured Data Matured Data
Warehousing support.Warehousing support. Easy Integration with Easy Integration with
existing business existing business database.database.
Applications:Applications: E-commerceE-commerce Web-based industriesWeb-based industries
<invoice>
<account_number>555 777-3158 573 234 </account_number>
<bill_period>Jun 9 - Jul 8, 2000</bill_period>
<carrier>Sprint</carrier>
<itemized_call no=”1” date=”JUN 10” number_called=”973 555-8888” time=”10:17pm” rate=”NIGHT” min=”1” amount=”0.05” />
<itemized_call no=”2” date=”JUN 13” number_called=”973 650-2222” time=”10:19pm” rate=”NIGHT” min=”1” amount=”0.05” />
<itemized_call no=”3” date=”JUN 15” number_called=”206 365-9999” time=”10:25pm” rate=”NIGHT” min=”3” amount=”0.15” />
<total>$0.25</total>
</invoice>
04-19-2001 21
XML and Relational Database ProblemProblem
Many Application usually change its data very frequently.Many Application usually change its data very frequently. e.g., flight reservation, online billing, inventory.e.g., flight reservation, online billing, inventory.
Current SolutionCurrent Solution Reloading the complete XML document when changed which is very Reloading the complete XML document when changed which is very
expensive.expensive. Rainbow SolutionRainbow Solution
Incrementally propagate XML Document Updates to Stored XML Data.Incrementally propagate XML Document Updates to Stored XML Data. Goal: XML Repository Implemented using RDBMSGoal: XML Repository Implemented using RDBMS Approach: Flexible MappingApproach: Flexible Mapping Features: Features:
• DTD Metadata Management in RDBDTD Metadata Management in RDB• Automatic Schema CreationAutomatic Schema Creation• Incremental Update PropagationIncremental Update Propagation• XML Query OptimizationXML Query Optimization
04-19-2001 22
Rainbow Analysis
Exp1: Batch vs Series
0
0.5
1
1.5
2
2.5
0 2 4 6 8 10# of Operations
Tim
e (
s)
avg serial Linear (avg)
04-19-2001 23
Rainbow Analysis Cont..
Time VS Data Size
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0 500 1000 1500 2000 2500Data Size (KB)
Tim
e (
s)
renameitem
renameattribute
pushupattribute
pushdownattribute
04-19-2001 24
HTML vs. XML
HTMLHTML<h1>Car</h1><h1>Car</h1>
<h2>Make</h2><h2>Make</h2>
<p>Ford Mustang<p>Ford Mustang
<h2>Seats</h2><h2>Seats</h2>
<p>5<p>5
<h2>Top Speed</h2><h2>Top Speed</h2>
<p>70 m.p.h<p>70 m.p.h
XMLXML<h1>Car</h1><h1>Car</h1>
<make>Ford Mustang</make><make>Ford Mustang</make>
<seats>5<seats><seats>5<seats>
<speed units=“mph”>70</speed><speed units=“mph”>70</speed>