Top Banner
Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998 http://www-db.stanford.edu/lore/ EECS 684 02/21/2000 Presented by Weiming Zhou
15

Indexing Semistructured Data

Feb 25, 2016

Download

Documents

zaynah

Indexing Semistructured Data. J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998 http://www-db.stanford.edu/lore/. EECS 684 02/21/2000 Presented by Weiming Zhou . Outline. Introduction - Data Model - Query Language - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Indexing Semistructured Data

Indexing Semistructured Data

J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman

Stanford University January 1998

http://www-db.stanford.edu/lore/

EECS 684 02/21/2000 Presented by Weiming Zhou

Page 2: Indexing Semistructured Data

Outline

• Introduction - Data Model - Query Language• Indexes in Lore• Query plans using indexes• Conclusions

Page 3: Indexing Semistructured Data

Data Model - Object Exchange Model (OEM)

Page 4: Indexing Semistructured Data

The Lorel Query Language (Lorel)

Example 1select DB.Movie.Titlewhere DB.Movie.Actor.Name = “Harrison Ford”

Example 2select Tfrom DB.Movie M, M.Title Twhere exists A in M.Actor : exists N in A.Name

: N = “Harrison Ford”

Page 5: Indexing Semistructured Data

Indexes In Lore

• Value index• Text index• Link index• Path index• Edge index

Page 6: Indexing Semistructured Data

Value index

Similar to attribute indexes in Relational DBMS

Example

Suppose we create a Value index for DB.Movie.Year

If we perform a lookup for DB.Movie.Year = “1956”, Result: &12.

Page 7: Indexing Semistructured Data

Text Index

• An information-retrieval style keyword search.• Restricted by incoming labels.• Locates string values containing specific words.• Useful for strings containing a significant amount of text.

Implementation:Inverted lists - map a given word w and label l to a list of atomic values with incoming edge l that contain word w.

Example: Lookup for all objects with an atomic string value containing theword “Ford" and an incoming edge Name.Results: {<&17, 2>, <&21, 2>}.

Page 8: Indexing Semistructured Data

Link Index

• Locates parents of a given object.• Serves as back-pointers

Implementation• Extendible hashing• One Link Index for the entire database graph

Example The Link Index lookup for object &17 returns parent object &6, and the lookup for object &21 returns object &13.

Page 9: Indexing Semistructured Data

Path Index

Locate all objects reachable by a given labeled path.

Provided by DataGuide.

Exampleselect DB.Movie.Title Using the Path Index to directly locate all objects reachable via DB.Movie.Title.

Results: &5; &9; &14.

Page 10: Indexing Semistructured Data

Edge Index

All parent-child pairs connected via a specified label.

Example

Look up label “Year” in Edge Index

Results: &2-&7, &3-&12

Page 11: Indexing Semistructured Data

Query Plans Using Indexes

• Top-Down• Bottom-Up• Hybrid

Example select Tfrom DB.Movie M, M.Title Twhere exists A in M.Actor : exists N in A.Name

: N = “Harrison Ford”

Page 12: Indexing Semistructured Data

Top-Down Query Plan

Exhaustive Top-down traversalsDB.Movie.Actor.Name = “Harrison Ford” &17, &21 Link Index &17 &2, &21 &4DB.Movie.Title &5, &14

Page 13: Indexing Semistructured Data

Bottom-Up Query Plan

Look up Value Index DB.Movie.Actor.Name = “Harrison Ford” &17, &21Link Index &17 &2, &21 &4DB.Movie.Title &5, &14

Page 14: Indexing Semistructured Data

Hybrid Query Plan

select Xfrom A.B Xwhere exists Y in X.C : Y =5

Bottom-up: Value Index A.B.C = “5”

Top-down: A.B

Intersect

Page 15: Indexing Semistructured Data

Conclusions

• Presents Lore’s indexing structures: Value

Index, Text Index, Link Index, Path Index

and Edge Index.

• Query plans using indexes

• Preliminary performance results:

at least an order of magnitude improvement

when indexes are used for query processing.