1 1 1 ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001 On Storing Voluminous RDF Descriptions: The case of Web Portal Catalogs (http://www. (http://www.ics ics.forth. .forth.gr gr/proj proj/isst isst/RDF) /RDF) Sofia Alexaki, Vassilis Christophides Gregory Karvounarakis, Dimitris Plexousakis Computer Science Department, University of Crete and Institute for Computer Science - FORTH Heraklion, Crete, Greece 2 ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001 Portalmania!
14
Embed
On Storing Voluminous RDF Descriptions: The case of Web Portal ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
11
1
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
On Storing Voluminous RDF Descriptions:
The case of Web Portal Catalogs (http://www.(http://www.icsics.forth..forth.grgr//projproj//isstisst/RDF)/RDF)
Sofia Alexaki, Vassilis ChristophidesGregory Karvounarakis, Dimitris Plexousakis
Computer Science Department, University of Creteand
Institute for Computer Science - FORTHHeraklion, Crete, Greece
2
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Portalmania!
22
3
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Internet Portals Example: The Open Directory
4
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Browsing the ODP Topics
33
5
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Browsing the ODP Topics
6
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Searching the ODP Topics & URLs
n Descriptions in ODP consist of the classification of URIs to topics, atextual description and various administrative information
44
7
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
ODP Search Results: Hotel Paris Orsay
8
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
ODP Search Results: Hotel Paris Orsay
55
9
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
What is needed?
n Flexible Modeling of Web Portal Catalogs using W3C standards (RDF/S)u Exploit existing forms of domain knowledge
• Ranging from simple vocabularies to formal ontologies
v Describe information resources in various ways• Administration, Classification, Content Rating, Channels, ….
n Secondary Storage Management of Portal Metadatau Large Schemas: e.g., 170 Mbytes of ODP Topics (the Art Hierarchy
contains 25315 terms)
v Voluminous Description Bases: e.g., 700 Mbytes of ODP indexed sites
(2,342,978 URLs)
n Declarative Query Languages for Portal Catalogsu Interleave schema with data querying
v Optimize access to Portal Catalogs
10
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Our Approach
High-level Access to
community information
Archives
Virtual XML Warehouse
Documents
Databases
Web
RDF
� Use W3C Standards to describe (RDF/S) & exchange (XML) information
� Our Main Contribution: Declarative Languages for Browsing & Querying
66
11
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Outline
n The Open Directory Portal: a case studyn The RDF Query Language (RQL)n RDF Storage Strategiesn Testbed: the ODP RDF dump
g Representative queriesg Performance
n Summary and Outlook
12
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Resource Description Framework (RDF/S)
n RDF: Resource Descriptions gData Model: Directed Labeled Graphs
• Nodes: Resources (URIs) or Literals• Edges: Properties – Attributes or Relationships• Labels: Nodes (Class names) and Edges (Property names)• Statement: assertion of the form resource, property, value• Description: collection of statements concerning a resource
gSpecialization of both classes & properties (simple & multiple)gMultiple classification under several classesgUnordered, optional, and multi-valued properties gDomain and range polymorphism of properties
14
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
RDF/S vs. Well-Known Formalisms
n Relational or Object Database Models (ODMG, SQL)g Classes don’t define table or object typesg Instances may have associated quite different propertiesg Collections with heterogeneous members
n Semistructured or XML Data Models (OEM, UnQL, YAT, XML Schema)
g Labels on both nodes and edgesg Schema class and property subsumption is not capturedg Heterogeneous descriptions reminiscent of SGML exceptions
n Knowledge Representation Languages (Telos, DL, F-Logic)g Absence of complex values and n-ary relationships (bags, sequences)
88
15
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
The RDF Query Language: RQL
n Declarative query language for RDF description basesgrelies on a typed data model (see SemWeb2001 paper)gfollows a functional approach (basic queries and filters)gadapts the functionality of semistructured or XML query languages
to RDF, but also: • treats properties as self-existent individuals• exploits taxonomies of node and edge labels • allows querying of schemas as semistructured data
n Relational interpretation of schemas & resource descriptions
16
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
selectselect $X$X
fromfrom Regional {{::$X$X}}
wherewhere $X$X likelike “*“*Hotel*” *”
andand $X$X < < Paris
Portal Navigation with RQL
n Browsing large description bases is cumbersome!n RQL provides powerful path expressions permitting filtering and
navigation on both portal schemas and resource descriptionsn E.g., to find (under the Regional ODP hierarchy) URI’s of Hotels in
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
DBMS Size vs. Schema Triples
èDBMS size scales linearly with the number of schema triples
SpecRepr GenRepr
Aver. triple size (with indexes)
0.086 KB (0.1734 KB)
0.1582 KB (0.3062 KB )
Aver. triple storage time (with indexes)
0.0021 sec (0.0025) sec
0.0025 sec (0.0032 sec)
1111
21
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
DBMS Size vs. Data Triples
èDBMS size scales linearly with the number of data triples
SpecRepr GenRepr
Aver. triple size (with indexes)
0.123 KB (0.2566 KB)
0.123 KB (0.2706 KB )
Aver. triple storage time (with indexes)
0.0033 sec (0.0043) sec
0.0039 sec (0.00457 sec)
22
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Query Templates for RDF description bases
Pure schema queries
Q1 Find the range (or domain) of a property
Q2 Find the direct subclasses of a class
Q3 Find the transitive subclasses of a class
Q4 Check if a class is a subclass of another class Queries on resource descriptions using available schema knowledge
Q5 Find the direct extent of a class (or property) Q6 Find the transitive extent of a class (or property) Q7 Find if a resource is an instance of a class
Q8 Find the resources having a property with a specific (or range of) value(s)
Q9 Find the instances of a class having a given property Schema queries for specific resource descriptions
Q10 Find the properties of a resource and their values Q11 Find the classes under which a resource is classified
1212
23
ICS-FORTH & Univ. of Crete Gregory Karvounarakis May 2001
Execution Time of RDF Benchmark Queries
Query Generic Specific
Case 1 Case 2 Case 3 Case 1 Case 2 Case 3 Q1 0.0015 0.0012