Top Banner
© 2008 OpenLink Software, All rights reserved Open Conceptual Data Models Making the Conceptual Layer Real via RDF Linked Data
35
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Open Conceptual Data Models

Making the Conceptual Layer Real

via

RDF Linked Data

Page 2: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Conceptual Data Models in the Linked Data Web

Linked Data Vision: The transition of the Web

from a Web of linked documents to a Web of interlinked structured data items

(aka: entities, data objects, resources)

Concurrent trend in the IT industry: A recognition of the benefits of conceptual data models vs

logical data models

The Big Question: To what extent does the Linked Data

support conceptual level data models ?

Page 3: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Open Conceptual Data Models

Topics: Conceptual & Logical Data Models Conceptual Models for the Semantic Web Realizing Conceptual Models

through Ontologies & Linked Data Virtuoso RDF Views ADO.NET Data Services & the Entity Data Model

Page 4: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Conceptual & Logical Data Models

Describe a software system’s target problem space

Typically, in today’s database-driven applications

Three levels of data model Physical

How data is physically represented on disk Logical (aka logical schema)

Expresses problem domain in terms of data management technology (tables / columns)

e.g. relational schema Conceptual (aka conceptual schema)

Purely semantic description of problem space Describes things (entities), their characteristics

(attributes) & associations between things (relationships)

Page 5: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Logical Data Model

Most prominent of the three data model types Main focus of database applications

Due to pervasiveness of SQL in application code

Weaknesses Impedance mismatch Loss of semantics during development process Heterogeneous databases & interoperability

Page 6: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Logical Data Model Weaknesses

Impedance Mismatch SQL expresses queries in terms of tables / views

=> targets logical schema Normalization fragments the data model

Entities & their attributes may be split across several tables

Navigation between objects requires relational joins over two or more tables

Table rows must be reconstituted into higher level conceptual entities

Conceptual level data model is desirable to: Remove impedance mismatch Isolate application from changes to logical data model Provide framework for human level interaction

Page 7: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Logical Data Model Weaknesses

Loss of Semantics During Development

Process: Develop conceptual model (E-R modelling) Transform to logical model for implementation Derive physical model from logical model

Problems: Each move to a lower level model discards meaning Higher level model typically not retained Model semantics fragmented across

schema / business rules / application code Application must know logical data model

Must be hardcoded or inferred (imperfectly) from system tables

Page 8: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Logical Data Model Weaknesses

Heterogeneous Databases & Interoperability

Logical data model Describes problem domain in terms of tables/columns Requires SQL to navigate model

Application Exposed to specifics of a particular vendor’s RDBMSIn heterogeneous database environment, must handle Different SQL dialects Different schemas

No explicit data model. No explicit semantics. Interoperability/integration = perpetual problem for IT depts

Page 9: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Conceptual Models for the Semantic Web

Growing recognition in the industry of the benefits of a conceptual, rather than logical, model for data-centric applications e.g. Microsoft’s Entity Data Model / Entity Framework

Semantic Web technologies provide powerful tools for this paradigm shift

Page 10: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Benefits of Conceptual Models

How the Semantic Web benefits

More faithfully represents human view of domain of interest Conceptual model & semantics

Explicit & available globally Not implicit & fragmented across business logic / UI etc

Better / explicit semantics promises better search engines Much easier heterogeneous data integration

Data on the Web is inherently heterogeneous

Page 11: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Application Areas – Present & Future

Social networking, e-commerce, collaborative working Require shareable, standards-based, cross-platform

conceptual views of data Data portability

Needed as Web users maintain multiple points of presence – blogs, social network accounts etc.

Open business models Require exchange & integration of large amounts of data

Scientific research – sharing of knowledge & findings Requires transparent access to distributed

heterogeneous data Requires database integration using global schema

Autonomous intelligent agents Free humans from large-volume information processing

Page 12: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Semantic Web Technology Benefits

What Semantic Web technologies bring:

Ontologies Can represent common semantics

Spanning databases, applications, enterprises, on-line communities Act as a shared conceptual model Provide common models (FOAF, SIOC etc)

Common Semantics (Ontologies) & Common Data Representation (RDF) Enable cross data source querying using SPARQL

Content from several sites can be combined / explored Querying using proprietary APIs unnecessary Brute force data merging unnecessary

Open Data Formats, Platform Independence, Common Models Allow data portability and data integration

Page 13: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Realizing Conceptual Models

Ontologies Provide the building blocks of Semantic Web conceptual

models Define the concepts and their relationships in a domain of

interest

Describing Classes & Properties – Ontology Languages RDFS

Introduces the notions of concepts (classes) & instances OWL

Adds more vocabulary for describing: relations between classescardinality richer typing of properties, etc.

Page 14: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Goodness of Fit

RDF was designed from the ground up as a metadata data model

RDF / RDFS / OWL work directly at the level of conceptual models

Conceptual model terminology matches RDF/OWL terminology Concepts, entities, attributes, relationships

A natural fit!RDF lends itself naturally to describing conceptual models

Page 15: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Semantic Expressivity

DDL-based Relational Model Relationship between two entities isn’t explicit Foreign key relating two rows in separate tables doesn’t

express the nature of the relationship Semantics must often be inferred from table definitions

RDF-based Conceptual Model Relationship between two entities is stated explicitly by

predicate in subject-predicate-object triple Semantic expressivity of RDF/RDFS/OWL is much better

than DDL Has richer semantic content than equivalent DDL-based

logical/relational model

Page 16: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

RDF Conceptual Model – Artist / Records / Tracks

Page 17: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Global Granular Information Sharing

Traditional Logical/Relational Data Model Schema described by DDL is internal to DBMS Primary keys identifying an individual table row

(i.e. entity instance) not globally unique, not easily usable outside host DBMS

Gives rise to ‘data silos’

RDF’s use of HTTP-based URLs Externalises the data and schema Makes both globally accessible & scalable Provides globally unique IDs for entities/relations/classes A vehicle for granular, global information sharing

down to the equivalent of the record level

Page 18: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Linked Data – What is It?

A method for exposing, sharing & connecting data on the Web

A term coined by Tim Berners-Lee that describes HTTP-based Data Access by Reference for the Web

Open Data Access & Connectivity mechanism for the Web A richer linking mechanism for the Web that takes us from

Hypertext Links (Document to Document) to Hyperdata Links (across things that documents are about)

Page 19: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Linked Data – Why Is It Important

It exposes the compound nature of Web Resources Information resources (Containers) are uniquely

identified & referenceable Entities within Containers are uniquely identified &

referencable

It provides an Open Data Access & Connectivity mechanism for the Web

It delivers a powerful mechanism for meshing disparate and heterogeneous data sources

Page 20: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Linked Data Model

Changes the focus from linked documents to linked entitiesThe document as a data container becomes less relevant

Page 21: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Hyperdata Links Between Data Objects

Page 22: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Linked Data Benefits – Natural Navigation

Natural Navigation Through Typed Links RDF entities are identified by dereferencable URIs (URLs) Navigating from one data item to another is easy

One click to dereference in Semantic Web Browser e.g. OpenLink Data Explorer

URI of object in an RDF statement is a typed link Link’s “type” is defined by the statement predicate

Relational/Logical Model Cumbersome Requires SQL joins + typically Object-Relational mapping e.g. in C# : track = lennonAlbum.Tracks[“Imagine”]

Page 23: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Linked Data Benefits - Aggregatable Data

Often desirable to have an integrated view of all the data available about an item or topic

Database Realm Integration problematic, difficult to combine logical schemas

Semantic Web Data aggregation is easy: every resource has a unique URI

Individual items can be linked Conceptual models can be linked

Cross-domain links enrich domain knowledge Different facets of the same entity may be described by

different URIs minted by different authors Can be linked. e.g. owl:sameAs, rdf:type predicates May expose facts not directly represented in any one

source

Page 24: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Linked Data – Data Aggregation

Page 25: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Linked Data Benefits - Self Describing Data

RDF A technology for creating self-describing Web resources Entity’s type definition ‘accompanies’ it using rdfs:type An RDF dataset can be queried using SPARQL without

knowing anything beforehand about the data Provides the basis for powerful data exploration tools

Logical / Relational Schema Users / applications need a detailed understanding of the

schema to use and navigate the data Application’s knowledge of the schema typically hardcoded Ad-hoc end-user data exploration potentially error prone

Page 26: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Linked Data Benefits - SPARQL

If a user agent has no built-in knowledge of a particular RDF subject, predicate or object, it can use the URI to retrieve the information

The Power of SPARQLDiscover what sorts of things a data source contains select distinct ?URI ?ObjectType where { ?URI a ?ObjectType }

Determine all the properties of an entity class select * where { <http://my.org/resourceTypes/Department>

?property ?hasValue }

Determine all the properties and values of an entity instance DESCRIBE <http://my.org/resource/Accounts>

No prior knowledge of the RDF data source is needed

Page 27: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Virtuoso - Linked Data Generation OptionsConceptual layer insulates Linked Data consumers from RDFization infrastructure & data source heterogeneity

Page 28: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Virtuoso RDF Views

Expose relational data as RDF Provide the means to move from a logical model view to a

conceptual model view

Available for querying through SPARQL or SPASQL (SPARQL embedded in SQL)

No physical regeneration of relational data

RDF Views = Virtuoso RDF Meta-Schema + Meta-Schema Language

MSL = A domain specific, declarative language for mapping a

logical SQL data model to a conceptual RDF data model

Page 29: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Northwind Demo Database:RDF View Definition Extract

Customer ID

Company Name

Contact Name

Contact

Title

Address City Postal Code

Country Phone Fax

prefix northwind: <http://www.openlinksw.com/schemas/northwind#>

create iri class northwind:Customer <http://^{URIQADefaultHost}^/Northwind/Customer/%U#this> (in customer_id varchar not null)

alter quad storage virtrdf:DefaultQuadStorage

from Demo.demo.Customers as customers

from Demo.demo.Orders as orders … {

Demo.demo.Customers

Northwind RDF View Definition

create virtrdf:NorthwindDemo as graph iri (“http://^{URIQADefaultHost}^/Northwind”) {

northwind:Customer(customers.CustomerID) a foaf:Organization as virtrdf:Customer-CustomerID ;

northwind:companyName customers.CompanyName as … ;

northwind:fax customers.Fax as virtrdf:Customer-fax .

} }

northwind:Customer(orders.CustomerID)northwind:has_order northwind:Order(orders.OrderID) as virtrdf:Order-has_order .

Page 30: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

Northwind Demo Database:Customer Table to RDF Entity Mapping

Customer ID

Company Name

Contact Name

Contact

Title

Address City Postal Code

Country Phone Fax

ALFKI Alfreds Futterkiste

Maria Anders

Sales Represe-ntative

Obere Str. 57 Berlin 12209 Germany 030 - 0074321

030 - 0076545

companyName

contactName

contactTitle

address city

PostalCode

country

phone

fax

AlfredsFutterkiste

MariaAnders

SalesRepresentative

Obere Str. 57

Berlin

12209

Germany

030-0074321

030-0076545

Order/10643#this

has_order

Order/10692#this

has_order

Customer/ALFKI#this

prefix <http://demo.openlinksw.com/Northwind/> has_customer

has_customer

Order ID

CustomerID

10643 ALFKI …

10692 ALFKI …

Orders Table

Page 31: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

LinqToRdf + Virtuoso

Page 32: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

LinqToRdf to MusicBrainz - Conceptual Model Veneer

Page 33: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

ADO.NET Data Services & Entity Data Model

A framework for exposing ‘pure data’ service over HTTP

No support for RDF Fails to imbibe any of RDF’s inherent benefits

Lack of platform independence & standards compliance Supports REST-style interfaces Supports Atom, JSON and XML payloadsBut Server-side: Windows only Consuming Astoria services at a higher level requires

Windows .NET client or Silverlight-supported browser

Page 34: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

ADO.NET Data Services & Entity Data Model

Server-side only conceptual model Powerful URL addressing to query/navigate/sort/filter etc

Customers collection:http://myserver/data.svc/Customers

Customer ALFKI: http://myserver/data.svc/Customers('ALFKI')

Customer ALFKI's orders: http://myserver/data.svc/Customers('ALFKI')/Orders

But Client must know conceptual schema

e.g. to construct above URIs

Lack of Deferencable Entity IDs Ability to discover entities and dereference their

descriptions (attributes/relations) is confined to the facilities offered by .NET

c.f. SPARQL’s ability to handle unknown data sources

Page 35: Open Conceptual Data Models

© 2008 OpenLink Software, All rights reserved

ADO.NET Data Services & Entity Data Model

No Support for Non-SQL Data Sources Astoria is aimed exclusively at making relational data Web

accessible

c.f. Semantic Web & Linked Data Recognize that vast amounts of data resides in

unstructured and semi-structured data sources Support for embedding RDF into existing (X)HTML

RDFa, GRDDL, eRDF Emerging tools for converting non-RDF data to RDF Emerging tools for exposing SQL data as RDF

Astoria lacks scalability & scope of Semantic Web technologies