Top Banner
Web Science & Technologies University of Koblenz ▪ Landau, Germany Information-Rich Programming in F# with Semantic Data
56

Information-Rich Programming in F# with Semantic Data

Jan 26, 2015

Download

Technology

Steffen Staab

Programming with rich data frequently implies that one
needs to search for, understand, integrate and program with
new data - with each of these steps constituting a major
obstacle to successful data use.

In this talk we will explain and demonstrate how our approach,
LITEQ - Language Integrated Types, Extensions and Queries for
RDF Graphs, which is realized as part of the F# / Visual Studio-
environment, supports the software developer. Using the extended
IDE the developer may now

a. explore new, previously unseen data sources,
which are either natively in RDF or mapped into RDF;
b. use the exploration of schemata and data in order to
construct types and objects in the F# environment;
c. automatically map between data and programming language objects in
order to make them persistent in the data source;
d. have extended typing functionality added to the F#
environment and resulting from the exploration of the data source
and its mapping into F#.

Core to this approach is the novel node path query language, NPQL,
that allows for interactive, intuitive exploration of data schemata and
data proper as well as for the mapping and definition
of types, object collections and individual objects.
Beyond the existing type provider mechanism for F#
our approach also allows for property-based navigation
and runtime querying for data objects.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Information-Rich Programming in F# with Semantic Data

Web Science & Technologies

University of Koblenz ▪ Landau, Germany

Information-Rich Programming in F#

with Semantic Data

Page 2: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 2

WeST

Linked Open Data Cloud

… the Web of Linked Data consisting of more than 30 Billion RDF triples from

hundreds of data sources …

Gerhard WeikumSIGMOD Blog, 6.3.2013http://wp.sigmod.org/

Where’s the Data in the Big Data Wave?

Page 3: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 3

WeST

Some „Bubbles“ of the LOD Cloud

Page 4: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 4

WeST

RDF: Simple Foundations

Page 5: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 5

WeST

Example RDF Graph

Native GraphOR

R2RML: RDB to RDF Mapping Language(W3C rec)

Page 6: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 6

WeST

Agenda

SchemEX

Construction of schema-based index

Schema induction

LiteQ – Language integrated types, extensions and queries for RDF graphs

Exploring Programming, Typing

Evaluation of LITEQ (NPQL) against SPARQL

Understandability

Ease of use

Page 7: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 7

WeST

Exploring a data source

Using a data source

Programming against unknown data source

Page 8: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 8

WeST

Example application

• Goal: Application that helps to collect dog license fee• Send Email reminders to dog owners

• Data is given as RDF graph

Page 9: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 9

WeST

Programmer‘s Task 1: Schema Exploration

Schema exploration & Identification of important RDF types• Find RDF types representing dogs and persons

Page 10: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 10

WeST

Naive Approach Task 1: Schema Exploration

Schema exploration & Identification of important RDF types• Find RDF types representing dogs and persons

Tooling for Naïve Approach: SPARQL Query Formulation

Page 11: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 11

WeST

Programmer‘s Task 2: Code Type Creation

Code type creation in host language• Convert the identified dog and person RDF types to

code types in the host language

type exDog(uri) = classinherit exCreature(uri)member this.hasOwner : exPerson

= …member this.TaxNo : Integer = …

endtype exPerson(uri) = class

inherit exCreature(uri)end

type exCreature(uri) = classmember this.hasName : String = …Member this.hasAge : int = …

end

Page 12: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 12

WeST

Programmer‘s Task 3: Data querying

Data querying• Write a query that returns all dog owners

Page 13: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 13

WeST

Naive Approach Task 3: Data querying

Data querying• Write a query that returns all dog owners

Tooling for Naive Approach: SPARQL Query formulation

Page 14: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 14

WeST

Naive Approach Task 4: Object manipulation

Create the objects, manipulate them & make them persistent• Develop functionality around query to send reminder

let queryString = “SELECT ?owner WHERE {?dog rdf:type exDog.?dog ex:hasOwner ?owner

}“

dbConnection.evaluate(queryString) |> Seq.iter ( fun uri ->

let p = new Person(uri)sendReminderEmail(p)

)

Page 15: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 15

WeST

The LITEQ approach

Page 16: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 16

WeST

Node Path Query Language

Page 17: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 17

WeST

Graph Traversal with NPQL: Subtype Navigation >

rdf:Resource > ex:Creature

NPQL

Page 18: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 18

WeST

ex:Dog

Graph Traversal with NPQL: Property Navigation .

ex:Dog . ex:hasOwner

NPQL

Page 19: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 19

WeST

• Select ex:Dog• Walk through

ex:hasOwner toex:Person

• Use extension toretrieve all personswho own dogs:

ex:Bob

Extensional Semantics: Task 3 – Querying for Owners

rdf:Resource > ex:Dogrdf:Resource > ex:Creature > ex:Dogrdf:Resource > ex:Creature > ex:Dog . ex:hasOwnerrdf:Resource > ex:Creature > ex:Dog . ex:hasOwner -> Extension

NPQL

Page 20: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 20

WeST

rdf:Resource > ex:Creature > ex:Dog.hasOwner

Intensional Semantics: Task 2 - Creating Person Code Type

• Select ex:Person node• “Intension”

to get code type based on rdf type

rdf:Resource > ex:Creature > ex:Dog.hasOwner -> Intension

NPQL

type exPerson(uri) = classinherit exCreature(uri)

end

type exCreature(uri) = classmember this.hasName : String = …Member this.hasAge : int = …

end

Page 21: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 21

WeST

Suggestions during query writing • Instances based on

extensional semantics• Types & Props

based on intensional semantics

Autocompletion Semantics: Task 1 - Exploration

rdf:Resource > ex:Creature

ex:Person, ex:Dog

NPQL

rdf:Resource > ex:Creature >

Page 22: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 22

WeST

Extensional Semantics: LA Conjunctive Queries

Left associative

conjunctive query

with projection

ex:Dog <- ex:hasOwner

NPQL

Page 23: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 23

WeST

Host Language Extension: Task 4 – Create Objects

Create the objects, manipulation & persistence• Develop the functionality around the query

that will send the reminder using LITEQ in F#

Preliminary Implementation in F#http://west.uni-koblenz.de/Research/systems/liteq

Page 24: Information-Rich Programming in F# with Semantic Data

Web Science & Technologies

University of Koblenz ▪ Landau, Germany

Live demo of LITEQ in Visual Studio/F#

Page 25: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 26

WeST

Task LINQ XML Type

Provider

Freebase Type

Provider

LITEQ current version

LITEQ Concept

1 Schema exploration

- (✔) per doc

(✔) only trees

✔ ✔

2 Code type creation

- (✔) erased types?

(✔) erased types

(✔) erased types

✔full

hierarchy

3 Data querying

✔ - ((✔)) very limited expressiv.

(✔)limited

expressiv.

✔ no full

SPARQL

4 Object manipulation & persistence

(✔) ✔ - ✔ no new object

creation

Related Work

Page 26: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 27

WeST

Future work wrt LITEQ

• Current implementation is a prototype• Current implementation uses erased types

At runtime, no type hierarchy is present

• Switch to generated types in the future Higher expressiveness in the host language

exploiting type hierarchy

• Optimizations of LITEQ implementation necessary• Lazy evaluation

• Distinguish between design time and runtime• Not all types created at design time are needed at

runtime

• Formalize query language and investigate expressiveness

Page 27: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 28

WeST

Data modeling world

Description Logics

RDF

UML class

diagrams

Program modeling world

ML type inference

Challenge: Joint Type Inference

Page 28: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 29

WeST

Agenda

SchemEXWhere do I find relevant data?

Efficient construction of a schema-level index

LiteQ – Language integrated types, extensions and queries for RDF graphs

Exploring Programming, Typing

Evaluation of LITEQ (NPQL) vs. SPARQL

Understandability

Ease of use

Page 29: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 30

WeST

Preliminary Evaluation of LITEQ/NPQL

Focused on NPQL • Reason:

Test subjects lacked knowledge of F# and functional programming for evaluating LITEQ in full

• Comparing NPQL against SPARQL

Main Hypothesis of Evaluation • NPQL with autocompletion allows for effective query

writing in more efficient manner than SPARQL

Thus: some of the advantages of LITEQ cannot show up in the evaluation!

Page 30: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 31

WeST

Evaluation Subjects

Evaluation with 11 participants• 1 subject a posteriori eliminated from analysis of evaluation,

because he could not deal with SPARQL at all!

• 10 subjects remaining for analysis

Participants• Undergraduate students

• PhD students

• PostDocs

Page 31: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 32

WeST

Evaluation - Setup

1. Pre-questionaire

2. Training in RDF, SPARQL & NPQL

3. Experimental tasks to be solved by subjects

4. Post-questionaire

Page 32: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 33

WeST

Phase 1: Pre-Questionnaire – Knowledge & skills

• Programming: All “Intermediate” or above• Object-orientation: 8 “Intermediate” or above• Functional programming:

4 “Intermediate” or aboveLisp, Haskell, F# (once)

4 “none”

• .NET1 “Expert”

2 “Beginner”

7 “none”

• SPARQL: 3 “Intermediate” or above [Sparql Experts]7 below “intermediate” [Sparql Novices]

Page 33: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 34

WeST

Phase 2: Training in RDF, SPARQL, NPQL

Training in RDF & SPARQL• Presentation of RDF & SPARQL (20 minutes)• Practical excercise writing SPARQL queries

in the Web interface (5 minutes)

Training in NPQL• Practical excercise writing NPQL queries in Visual Studio

(5 minutes)

Page 34: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 35

WeST

Phase 3: Solving experimental tasks by subjects

9 different experimental tasks to solve• Half of tasks in NPQL using Visual Studio• Other half using SPARQL and a web interface

Task types• Navigation and exploration of a data source (Task 1)• Retrieving and answering questions about the data (Task 3)• 2 tasks were not solvable in NPQL

• Investigating how users deal with limits of the language

Evaluation measure: • Duration to complete each task

Page 35: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 36

WeST

Evaluation across different user types

Page 36: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 37

WeST

Evaluations per Task

Page 37: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 38

WeST

Phase 4: Post-Questionnaire

“Do you want to explore a data source in your IDE?”

4 “yes”

3 “no, prefer separation of steps”

3 “no preference”

“NPQL is easier to use than SPARQL”

7 “agree” or above

Other

• Better support when writing queries in SPARQL

• Better response times for interactive working with NPQL

My conclusion Though LITEQ is still in a pre-alpha status,

advantages became visible in preliminary user evaluation

Page 38: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 39

WeST

Agenda

SchemEXConstruction of schema-based index

Schema induction

LiteQ – Language integrated types, extensions and queries for RDF graphs

Exploring Programming, Typing

Evaluation of LITEQ (NPQL) against SPARQL

Understandability

Ease of use

Page 39: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 40

WeST

Searching the LOD cloud

?

foaf:Document

fb:Computer_Scientist

dc:creator

x

swrc:InProceedingsSELECT ?xWHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist}

Page 40: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 41

WeST

Searching the LOD cloud

SELECT ?xWHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist}

Index

Where? • ACM• DBLP

Page 41: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 42

WeST

Schema-level index

Schema information on LOD

Explicit

Assigning class types

Implicit

Modelling attributes

Class

Entity

rdf:type EntityProperty

Entity 2

Page 42: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 43

WeST

DS1

Schema-level index

E1

P1E2

XYZP2

C1

C2

C3

P1

P2

C1

C2

C3

DS1

Page 43: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 44

WeST

Typecluster

Entities with the same Set of types

C1 C2

DS1 DS2 DSm

Cn...

...

TCj

Page 44: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 45

WeST

Typecluster: Example

foaf:Document swrc:InProceedings

DBLP ACM

tc2309

Page 45: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 46

WeST

Bi-Simulation

Entities are equivalent, if they refer with the same attributes to equivalent entities

Restriction: 1-Bi-Simulation

P1 P2

DS1 DS2 DSm

Pn...

...

BSi

Page 46: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 47

WeST

Bi-Simulation: Example

dc:creator

BBC DBLP

bs2608

Page 47: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 48

WeST

SchemEX: Combination TC and Bi-Simulation

Partition of TC based on 1-Bi-Simulation with restrictions on the destination TC

C1 C2 Cn...

DS1 DS2 DSm...

C45 C2 Cn‘...

P1 Pn...

EQC EQC

DS

TCj TCk

EQCj

BSi

Sch

ema

Pay

load

P2

Page 48: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 49

WeST

SchemEX: Example

DBLP

...

tc2309 tc2101

eqc707

bs2608

foaf:Document swrc:InProceedings fb:Computer_Scientist

dc:creator

SELECT ?xWHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist}

Page 49: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 50

WeST

SchemEX: Computation

Precise computation: Brute-Force

C1 C2 Cn...

DS1 DS2 DSm...

C12 C2 Cn‘...

P1 Pn...

EQC EQC

DS

TCj TCk

EQCj

BSi

Sch

ema

Pay

load

P2

Smarter Approach?

Page 50: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 51

WeST

Stream-based Computation of SchemEX

LOD Crawler: Stream of n-Quads (triple + data source)

… Q16, Q15, Q14, Q13, Q12, Q11, Q10, Q9, Q8, Q7, Q6, Q5, Q4, Q3, Q2, Q1

FiFo

4

3

2

1

1

6

23

4

5

C3

C2

C2

C1

Page 51: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 52

WeST

Quality of Approximated Index

Stream-based computation vs. brute force

Data set of 11 Mio. tripel

Page 52: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 53

WeST

SchemEX @ BTC 2011

SchemEXAllows complex queries (Star, Chain)

Scalable computation

High quality

Index over BTC 2011 data2.17 billion tripel

Index: 55 million tripel

Commodity hardware VM: 1 Core, 4 GB RAM

Throughput: 39.500 tripel / second

Computation of full index: 15h

1. Place BTC 2011

Page 53: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 54

WeST

Future work wrt SchemEX

Further exploration of

• schema induction

• query federation

Federation vs Link Traversal based query execution

• Granularity of query execution

• Too fine grained: URI dereferencing

• Too expressive: SPARQL

• Sweet spot -> NPQL??

Page 54: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 55

WeST

Agenda

SchemEX

Construction of schema-based index

Schema induction

LiteQ – Language integrated types, extensions and queries for RDF graphs

Exploring Programming, Typing

Evaluation of LITEQ (NPQL) against SPARQL

Understandability

Ease of use

Page 55: Information-Rich Programming in F# with Semantic Data

Steffen [email protected] 56

WeST

Future

1. Searching for distributed data

2. Understanding distributed data

3. Intelligent queries on distributed data

4. Programming with distributed data• Type reuse• Type induction

Page 56: Information-Rich Programming in F# with Semantic Data

Web Science & Technologies

University of Koblenz ▪ Landau, Germany

Thank you for your attention!