Top Banner
Web Data Management Advanced Database Presentation By: Navid Sedighpour Professor : Dr. Alireza Bagheri Nevember 2015 1
35

Scalable Web Data Management using RDF

Feb 16, 2017

Download

Data & Analytics

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scalable Web Data Management using RDF

1

Web Data ManagementAdvanced Database Presentation

By:

Navid Sedighpour

Professor :

Dr. Alireza Bagheri

Nevember 2015

Page 2: Scalable Web Data Management using RDF

2

InterestLack of schema

Data is unstructured or at best “semi-structured”Missing data, additional attributes, similar data but not identical

VolatilityMay confirm to one schema now, but not later

ScaleHow to capture everything?

Querying DifficultyWhat is the user language? What are the primitives?Aren’t Search Engines sufficient?

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 3: Scalable Web Data Management using RDF

3

Fusion Tables Users contribute data in spreadsheetPossible joins between multiple data setsExtensive visualization

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

More Recent Approaches to Web Querying

Page 4: Scalable Web Data Management using RDF

4

More Recent Approaches to Web QueryingXML

Data exchange languageTree based structure

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 5: Scalable Web Data Management using RDF

5

More Recent Approaches to Web QueryingRDF

W3C RecommendationSimple, self-descriptive model

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 6: Scalable Web Data Management using RDF

6

RDF Data Volumes90% of world's data generated over last two years

Data are growing fast

Size almost doubling every year

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 7: Scalable Web Data Management using RDF

7

RDF Data Volumes March 2009 – 89 Datasets

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 8: Scalable Web Data Management using RDF

8

RDF Data Volumes September 2010 – 203 datasets

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 9: Scalable Web Data Management using RDF

9

RDF Data Volumes September 2011 – 295 Datasets

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 10: Scalable Web Data Management using RDF

10

RDF Data VolumesApril 2014 – 1091 Datasets

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 11: Scalable Web Data Management using RDF

11

RDF IntroductionEverything is an uniquely named resource

Prefixes can be used to shorten names

Properties of resources can be defined

Relationships with other resources can be defined

Resource description can be contributed by different people/groups and can be located anywhere in the webIntegrated web “database”

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 12: Scalable Web Data Management using RDF

12

RDF Data ModelTriple : Subject, Predicate (Property) , Object

Subject : The entity that is described (URI or Blank Node)

Predicate : a feature of the entity

Object : value of the feature

Set of RDF Triples is called “RDF Graph”

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 13: Scalable Web Data Management using RDF

13

RDF Example Instance

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 14: Scalable Web Data Management using RDF

14

RDF Graph

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 15: Scalable Web Data Management using RDF

15

SPARQL Queries

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 16: Scalable Web Data Management using RDF

16

Naïve Triple Store Design

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 17: Scalable Web Data Management using RDF

17

Naïve Triple Store Design

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Easy to ImplementBut

Too Many self-joins

Page 18: Scalable Web Data Management using RDF

18

Property TablesGrouping by Entities

Types :Clustered Property TablesProperty Class Tables

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 19: Scalable Web Data Management using RDF

19

Clustered Property TablesGroup together the properties that tend to occur in the same (or similar) subjects

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 20: Scalable Web Data Management using RDF

20

Property Class TablesCluster the subjects with the same type of property into one property table

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 21: Scalable Web Data Management using RDF

21

Property TablesAdvantages :

Fewer Joins

Disadvantages :Lots of NULLsClustering is not trivialMulti-valued properties are complicated

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 22: Scalable Web Data Management using RDF

22

Binary TablesGrouping by Properties: for each property build a two column table containing both subject and object, ordered by subjects

Also called “Vertically Partitioned Approach”

N two column tables (n is the number of unique properties in the data)

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 23: Scalable Web Data Management using RDF

23

Binary TablesAdvantages :

Support multi-valued PropertiesNo NULLsNo ClusteringGood performance for subject-subject joins

Disadvantages:Not useful for subject-subject joinsExpensive inserts

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 24: Scalable Web Data Management using RDF

24

Graph-Based ApproachAnswering SPARQL query = Subgraph Matching

gStore

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 25: Scalable Web Data Management using RDF

25

Two steps need to be done :1. For each node of Q* get the lists of nodes in G* that include that node2. Do a multi-way join to get the candidate list

Alternatives :Sequential scan of G*

Both steps are inefficientS-Tree

Height Balanced Tree over signatures Run an inclusion query for each node of Q* and get lists of nodes in G* that include that node (q & s = q)

VS-Tree Support both steps efficiently Grouping by vertices

Graph-Based Approach

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 26: Scalable Web Data Management using RDF

26

S-Tree

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Pruning

Page 27: Scalable Web Data Management using RDF

27

S-Tree

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 28: Scalable Web Data Management using RDF

28

S-Tree

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 29: Scalable Web Data Management using RDF

29

S-Tree

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 30: Scalable Web Data Management using RDF

30

S-Tree

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 31: Scalable Web Data Management using RDF

31

VS-Tree

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 32: Scalable Web Data Management using RDF

32

VS-Tree

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 33: Scalable Web Data Management using RDF

33

ConclusionRDF Data seem to have considerable promise for web data management

We talked about four approaches to web data management including Naïve triple store design, Property Tables, Binary Tables and Graph-Based approach

VS-Tree has the best performance in Graph-Base approaches

gStore is more efficient than other approaches

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

Page 34: Scalable Web Data Management using RDF

34

References

Introduction Naïve Triple Store Design

Property Tables Binary Tables Graph-Based Conclusion

[1] D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach, "Scalable semantic web data management using vertical partitioning," in Proceedings of the 33rd international conference on Very large data bases, 2007, pp. 411-422.

[2] L. Zou, J. Mo, L. Chen, M. T. Özsu, and D. Zhao, "gStore: answering SPARQL queries via subgraph matching," Proceedings of the VLDB Endowment, vol. 4, pp. 482-493, 2011.

[3] L. Zou, M. T. Özsu, L. Chen, X. Shen, R. Huang, and D. Zhao, "gStore: a graph-based SPARQL query engine," The VLDB Journal—The International Journal on Very Large Data Bases, vol. 23, pp. 565-590, 2014.

[4] X. Shen, L. Zou, M. T. Ozsu, L. Chen, Y. Li, S. Han, et al., "A Graph-based RDF Triple Store."

Page 35: Scalable Web Data Management using RDF

35

Thanks

Any Questions???