Open Web Data for Education Linked Data technologies for connecting open educational data Mathieu d’Aquin, Philippe Cudre- Mauroux, Besnik Fetahu, Marieke Guy The Open University, University of Fribourg, L3S Hanover, Open Knowledge Foundation @mdaquin @FetahuBesnik @mariekeguy Slides at: http://slideshare.net/mdaquin
79
Embed
Open Web Data for Education - Linked Data technologies for connecting open educational data
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Open Web Data for Education
Linked Data technologies for connecting
open educational data
Mathieu d’Aquin, Philippe Cudre- Mauroux, Besnik Fetahu, Marieke Guy The Open University, University of Fribourg, L3S Hanover, Open Knowledge Foundation
Fostering Open Curricula and Agile Knowledge Bases for Europe’s Higher Education Landscape
• The Bowlogna ontology
• Extending & managing Bowlogna data
– Entity-centric data management
The Bologna Reform
• Started in June 1999
• Framework for higher education systems
• 47 Countries
• Common academic degrees
• Common study structure
• Common terminology
20
The university setting after Bologna
• A lot of data is available – Not following standard schemas
– Comprehensive and available data is a success factor
• Shared data – Erasmus exchanges
– Courses in a given language
• Analytic tools may help monitoring university performance
21
An ontology about Bologna
• A Lexicon for the Bologna Reform
– Basic set of terms for the new system
– Stable across time and institutions
– Developed by a professional terminologist
22
The ontology creation process
• The Bowlogna Ontology
– 29 top classes (67 in total)
– Classes: student, professor, evaluation, teaching
unit, ECTS credit, semester, etc.
– Concept definitions in English, French, German
23
Bowlogna Ontology
24
Bowlogna Ontology
• Private / Public parts
– Public data can be shared with other uni (e.g.,
course descriptions)
– Private data in sensible (e.g., evaluation results)
• Private data might contain more instances
• Aggregations over private data may be shared
(e.g., number of enrolled students)
25
Managing Bowlogna Data
• Entity-Centric Data Management
– Searching for entities
– Linking entities
– Typing entities
– Storing entities
26
Entities as Mediation
• Rising paradigm – Store information at the entity granularity
– Integrate information by inter-linking entities
• Advantages? – Coarser granularity compared to keywords
• More natural, e.g., brain functions similarly (or is it the other way around?)
• Easier to integrate 3rd party information
– Denormalized information compared to RDBMSs • Schema-later, heterogeneity, sparsity
• Pre-computed joins, “Semantic” linking
• Drawbacks?
27
Searching for Entities (1)
The Descendants
TheDescendants
type
title
GeorgeClooney
George Clooney
name
May 6, 1961
dateOfBirth
type
ShaileneW
Shailene Woodley
name
Nov. 15, 1991
dateOfBirth
type
playsIn
playsIn
• Main idea: combine unstructured and structured search
– Inverted index to locate first candidates
– Graph queries to refine the results
• Graph traversals (queries on object properties)
• Graph neighborhoods (queries on data type properties)
Inverted Index
Keywords
HTTP
DBMS
SPARQL
28
Searching for Entities (2)
LOD Cloud
index()
User
Query Annotation and Expansion
Inverted Index
RDF
Store
Ranking FunctionsRanking
FunctionsRanking Functions
query()
Entity SearchKeyword Query
intermediate
top-k resultsGraph-Enriched
Results
Graph Traversals(queries on object
properties)
Neighborhoods(queries on datatype
properties)
Structured
Inverted Index
WordNet
3rd party
search engines
Final Ranking Function
Pseudo-Relevance Feedback
29
Linking Entities (1)
• ZenCrowd: linking textual content to entities
• Uses sets of algorithmic matchers to match
entities to online concepts
• Uses dynamic templating to create micro-
matching-tasks and publish them on MTurk
• Combines both algorithmic and human
matchers using probabilistic networks
30
Linking Entities (2)
Micro Matching
Tasks
HTML
Pages
HTML+ RDFa
Pages
LOD Open Data Cloud
Crowdsourcing
Platform
Z enCrowd
Entity
Extractors
LOD Index Get Entity
Input Output
Probabilistic
Network
Decision Engine
Mic
ro-
Ta
sk M
an
ag
er
Workers Decisions
Algorithmic
Matchers
31
Storing Entities (1)
• Fundamental impedance mismatch between
graphs of entities and…
– N-ary / decomposition storage model
– Inverted Indices
– Key-value paradigms
32
Storing Entities (2)
• dipLODocus[RDF]
– Materialize the joins!
– Dense-pack the values
– Provide new indices
– Co-locate
– Co-locate
– Co-locate
33
Typing Entities
34
Type rankingType ranking
Type ranking
Text
extraction
(BoilerPipe)
Named Entity
Recognition
(Stanford NER)
List of
entity
labels
Entity linking
(inverted index:
DBpedia labels ⟹
resource URIs)
foreach
List of
entity
URIs
Type retrieval
(inverted index:
resource URIs ⟹ type URIs)
List of
type
URIs
Type rankingRanked
list of
types
Trank • Input: a knowledge base G, an Entity e, a context c in
which e appears. • Output: e’s types ranked by relevance wrt the context c.
References
• The Bowlogna ontology: Semantic Web J. 2013
• Searching for entities: SIGIR 2012
• Linking entities: WWW 2012, VLDB J. 2013
• Storing entities: ISWC 2011
• Typing entities: ISWC 2013
35
Pause
What else needs representing in educational
data?
What to do with it
Resource
Discovery
Research
Exploration
Social
Example: UK HESA/UNISTAT Key Information Set
http://www.hesa.ac.uk/unistatsdata
“Unistats, which incorporates the KIS, provides course level information on all undergraduate higher education courses provided in the UK, which are of at least one year’s duration and consist of 120 or more credits of study” [1]
Includes statistics about the success rate of degrees (courses), the type of assessment, and what students do afterwards (further study, jobs).
Need to download the data, unzip parse the xml, re-interpret it into own model, store the data, provide querying facility, and finally, build the application.
Doing it as linked data with a SPARQL endpoint does that once for everybody!
Life sciences 41 3,036,336,004 9.60 % 191,844,090 38.06 %
User-generated
content 20 134,127,413 0.42 % 3,449,143 0.68 %
295 31,634,213,770
503,998,829
and many
more
languages
(16)…
and many
more
organisatio
ns (184)…
The Big Picture: How to find the right information?
17/11/13 LinkedUp – Besnik Fetahu 51
How to find information
about “renewable
energy”?
search into individual
resources in all these
sources?
338 sources of information
~300 million individual
resources
- Manual inspection costly!
- Current infrastructure is not
reliable for such large scale
queries!
now what? Generate representative topics
for the individual data sources
Topics linking the data sources
into a central and interlinked
graph
Explore the graph for specific
concepts e.g. “renewable
energy”
Constructing Topic Profiles
17/11/13 LinkedUp – Besnik Fetahu 52
book
thesis
proceedings series
audio document manuscript
newspaper report The types of
information
existing in the
data source
individual
resources
Linux in wenigen Stunden beherrschen ; absolut keine Vorkenntnisse nötig! ; ideal für Einsteiger und Umsteiger ; Animationen, Videos und Sprachausg. erklären LINUX Schritt für Schritt.
"British Association for Biofuels and Oils“ The prime objective of the Association is to persuade Government to modify the tax on Biodiesel so as to give this splendidly 'green' fuel a chance to establish itself to the advantage of the environment. This means a tax structure which ensures that the pump price of Biodiesel is at least competitive with fossil diesel. A second objective is to see established in Britain a Biodiesel plant of sufficient size to get the appropriate economies of scale in production costs.
organization
"British Association for Biofuels and Oils“
The prime objective of the Association is to persuade Government to modify the tax
on Biodiesel so as to give this splendidly 'green' fuel a chance to establish itself
to the advantage of the environment. This means a tax structure which
ensures that the pump price of Biodiesel is at least competitive with fossil diesel.
A second objective is to see established in Britain a Biodiesel plant of sufficient
size to get the appropriate economies of scale in production costs.
Linux in wenigen Stunden beherrschen ; absolut keine
Vorkenntnisse nötig! ; ideal für Einsteiger und Umsteiger;