Top Banner
5 June 2013 BBC Linked Data Platform Using semantic technologies to make our content more connected and more discoverable
23

BBC Linked Data Platform (SemTechBiz San Fran 2013)

May 08, 2015

Download

Education

Dave Rogers

A introduction to the BBC's Linked Data Platform, with occassional dips into the detail of the code, ontologies and queries that make it possible.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BBC Linked Data Platform (SemTechBiz San Fran 2013)

5 June 2013

BBC Linked Data PlatformUsing semantic technologies to make our content more connected and more discoverable

Page 2: BBC Linked Data Platform (SemTechBiz San Fran 2013)

A (very) short history

✤ Dynamic Semantic Publishing

✤ BBC Sport - Transition from ‘static’ to ‘dynamic’

✤ Introduction of Semantic Technologies for World Cup 2010

✤ Raising the bar for Olympics 2012

✤ Linked Data Platform & The Creative Work

Page 3: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Olympics 2012Athletes & Medals: from trackside to our audience

Page 4: BBC Linked Data Platform (SemTechBiz San Fran 2013)

BBC Linked Data Platform

(our logo)

Page 5: BBC Linked Data Platform (SemTechBiz San Fran 2013)

LDP: The Creative Work

Min

imal

Met

adat

a

Sem

antic

ally

A

ggre

gate

d M

etad

ata

Triple Store

Website

Triple Store

Mobile Apps

IPTV

Open API

Page 6: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Creative Works

✤ Minimal metadata

✤ Enough non-semantic metadata to support ‘rich links’ in a wide range of applications

✤ Enough semantic metadata (tags) to support discovery through semantic queries

✤ Full metadata requires a content-type-specific metadata API

✤ Access to content requires a content API

Page 7: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Some use-cases

✤ Automated index pages/feeds

✤ Semantic navigation

✤ Semantic search

✤ A typical query:

✤ Top 10, most recent, BBC News Items about Politicians who are members of The Labour Party

Page 8: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Powered by LDP

BBC Sport

BBC Music

BBC Olympics 2012

BBC Knowledge & Learning Beta

BBC News Local Beta

BBC Sport Mobile App

Page 9: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Creative Work Ontology

Page 10: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Creative Works in Codecase class CreativeWork( locators: Set[Locator], title: String, modified: DateTime, format: Option[FormatType.FormatType] = None, created: Option[DateTime] = None, uri: Option[String] = None, primaryContentOf: List[PrimaryContentOf] = List(), about: List[String] = List(), mentions: List[String] = List(), `type`: CreativeWorkType = CreativeWorkType.CreativeWork, provenance: Option[CreativeWorkProvenance] = None, thumbnails: List[Thumbnail] = List(), audience: Option[AudienceType] = None, category: Option[CreativeWorkCategory] = None) { private val oneLocatorPerType = locators.groupBy(_.`type`).forall(_._2.size == 1) private val allLocatorsDistinct = locators.map(_.uri).size == locators.size

require(title.trim.isEmpty == false, "Creative Work has an empty title") require(title.length <= CreativeWork.MaxTitleLength, "Creative Work title exceeded the maximum length allowed of " + CreativeWork.MaxTitleLength) require(oneLocatorPerType, "Creative Work contained multiple Locators of the same type") require(allLocatorsDistinct, "Creative Work contained multiple identical Locator URNs") def guid = uri.map(_.replace("http://www.bbc.co.uk/things/", "")).map(_.replace("#id", ""))}

object CreativeWork { val Locator = "http://www.bbc.co.uk/ontologies/cms/locator" val MaxTitleLength = 300}

Page 11: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Creative Work Query*

CONSTRUCT { ?creativeWork a cwork:CreativeWork ; a ?type ; cwork:title ?title ; cwork:about ?about ; cwork:mentions ?mentions ; cwork:dateModified ?modified ; ?about bbc:preferredLabel ?aboutPreferredLabel . ?mentions bbc:preferredLabel ?mentionsPrefLabel .}WHERE {{ SELECT DISTINCT ?creativeWork ! WHERE {! {{#about}}! ! FILTER (?about = <{{about}}>) .! ! ?creativeWork cwork:about ?about .! {{/about}}! {{#mentions}}! ! FILTER (?mentions = <{{mentions}}>) .! ! ?creativeWork cwork:mentions ?mentions .! {{/mentions}}! ?creativeWork a cwork:CreativeWork ; ! ! a ?type ; ! ! cwork:title ?title ; ! ! cwork:dateModified ?modified .! }! ORDER BY DESC(?modified)! LIMIT 10! {{#offset}}OFFSET {{offset}}{{/offset}} } ?creativeWork a cwork:CreativeWork . { ?creativeWork a cwork:CreativeWork ; a ?type ; ! ! cwork:title ?title ; ! ! cwork:dateModified ?modified . { ?type rdfs:subClassOf cwork:CreativeWork . } UNION { OPTIONAL { ?creativeWork cwork:about ?about . OPTIONAL { ?about rdfs:label ?aboutLabel . } OPTIONAL { ?about bbc:preferredLabel ?aboutPreferredLabel . } } OPTIONAL { ?creativeWork cwork:mentions ?mentions . OPTIONAL { ?mentions rdfs:label ?mentionsLabel . } OPTIONAL { ?mentions bbc:preferredLabel ?mentionsPrefLabel . } } } }} *Simplified

SPARQL CONSTRUCT

Inner SELECT

Parametisation

Pagination

Mustache-templated

Page 12: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Our principal challenge:

Data Management

Page 13: BBC Linked Data Platform (SemTechBiz San Fran 2013)

4 Kinds of Data

✤ Creative Works

✤ Reference Data, managed in sets (Datasets)

✤ Reference Data, managed individually (Resources)

✤ Ontologies

Page 14: BBC Linked Data Platform (SemTechBiz San Fran 2013)

99.99% Availability

Page 15: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Our own URIs

✤ Everything has a ‘Thing URI’:

✤ http://www.bbc.co.uk/things/{GUID}#ID

✤ Opaque ID, dereferencable*

✤ BBC controls identity, therefore quality & consistency

✤ bbc:sameAs to DBPedia, Wikidata, Freebase etc

*coming soon

Page 16: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Our own ontologies

✤ Core set of ontologies that are BBC owned

✤ Creative Work, BBC, (Organsational) Provenance, etc

✤ Ability to change regularly and unilaterally

✤ Provide ‘mappings’ to more widely used ontologies (e.g. Schema.org)

✤ Domain ontologies can be shared or reused

✤ Sport, Politics, GeoLocation, etc

Page 17: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Open data

✤ Provided through Mashery

✤ ‘Connected Studio’ events will validate our API

✤ Public beta to follow

✤ JSON-LD & Turtle

✤ Future

✤ Self-provisioned, cloud-based triple stores

✤ Data Dumps

Page 18: BBC Linked Data Platform (SemTechBiz San Fran 2013)

The Hard Problems...

Page 19: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Managing concepts across BBC

✤ Which domain ‘owns’ Arnold Schwarzenegger?

✤ News? Entertainment? History? Politics?

✤ Can domains ‘own’ predicates?

✤ Layering information over shared concepts

✤ High quality sub-sets vs. lower quality ‘long-tail’

✤ Synchronisation with external datasets

✤ Tools for creating and managing concepts

✤ Emerging, splitting & combining concepts

✤ Linked Data gives us a language to solve these problems

Page 20: BBC Linked Data Platform (SemTechBiz San Fran 2013)

MetadataOften subjective, never complete

✤ What is this TV programme about?✤ Manual tag curation

✤ Subjective✤ Long-term expense✤ Inconsistent

✤ Automated tag generation✤ Short-term expense✤ Value in data or algorithm?✤ Complex✤ Relies on assumptions

✤ Our approach? Invest in both. Validate learnings.

Page 21: BBC Linked Data Platform (SemTechBiz San Fran 2013)

When to reason?

✤ Our options...

✤ Before writing to the triple store

✤ Materialised in the triple store (Forward-chaining inference)

✤ Inferred by the SPARQL engine (Backward-chaining inference)

✤ After SPARQL results have returned

✤ None/some/all of the above

Page 22: BBC Linked Data Platform (SemTechBiz San Fran 2013)

Maturity of Semantic Tech

✤ From a Software Industry perspective, Semantic (RDF) Technology is not mainstream and is therefore hard to sell

✤ Library/application immaturity can be a hinderance to innovation

✤ I believe the Sem Tech industry needs to focus on simplicity and abstraction

✤ Semantic Technology is complex, but using it, need not be