Hack U Barcelona 2011

Post on 15-Jan-2015

1701 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Very brief intro to Semantic Web and BOSS for a Yahoo! Hack U event at UPC in Barcelona, Spain.

Transcript

Fun with the Semantic Web

Peter Mika

Yahoo! Research Barcelona

pmika@yahoo-inc.com

- 2 -

Vague, but exciting… Berners-Lee and the dawn of the Web

- 3 -

Semantic Web

• Publish data on the Web– Linked Data: linking data similar to how we link documents on the Web– Query databases over the Web

• Architectural challenges– A common format for sharing data– Sharing the meaning of data– Infrastructure

• Semantic Web standards from W3C– Data and schema languages (RDF, OWL, RIF)– Document formats (RDF/XML, RDFa)– Protocols (SPARQL, HTTP)

• Semantic Web research into knowledge representation and reasoning, data integration, data quality and many other topics

• Community effort (Linked Data movement)

- 4 -

RDF (Resource Description Framework)

• The basic data model of the Semantic Web– A universal model to capture all sorts of data: networks,

relational, object-oriented…

• Basic unit of information is a triple – A tuple of (subject, predicate, object)– Example: (Joe, loves, Mary)– Each triple gives the value of a property for a given resource or

relates two objects to one another• Object is either a resource or a literal

• An RDF model is a set of triples– Ordering of statements in an RDF document is irrelevant

(unlike XML)

- 5 -

Resources vs. literals

• Resources are identified by a URI or otherwise the are called a blank node– URIs are a generalization of URLs– Notation: <http://www.example.org/Person> or ex:Person

• Literals have an optional language and datatype (string, integer etc.)– Literals can not be subjects of statements– Datatypes are identified by URIs, e.g. XML Schema datatypes– Two literals are the same if their components are the same– Notation: “Joe B.” or Joe@en^^http://…#string

- 6 -

Graphical and textual notation

• A number of ways to serialize an RDF model into an RDF document– RDF/XML, Turtle, N3, N-Triples– Example: http://www.cs.vu.nl/~pmika/foaf.rdf

my:Joe

“Joe A.”

name

typefoaf:Person

- 7 -

RDF is designed for the Web

• URIs provide web-wide global identification across datasets– A resource may be described by multiple documents– We know it’s the same resource because the same URI is

used or through reasoning (advanced topic…)– URIs are intented to be reused– Unique, but not single identifiers: two URIs may denote the

same thing

• URIs can be retrieved from the Web– A well-behaved URI returns a description of the resource – Provides authority: the definition of foaf:Person lives at that

URI

• Ontologies can be looked up as well– Typically at the root of the URIs, also known as the namespace– Example: http://xmlns.com/foaf/0.1/Person redirects to the

specification

- 8 -

URIs implicitly link data together

(#joe, #name, “Joe A.”)(#joe, #email, mailto:joe@joe.com)

(#mary, name, “Mary B.”)(#mary, gender, “female”)

(#joe, #loves, #mary)

Joe’s homepage

A dating site

Mary’s homepage

(#name, #type, #Property)(#name, #domain, #Person)

Schema doc

- 9 -

Put together, triples form a single ‘global’ graph

“Joe A.”

#joe

#name

“joe@joe.com”

#email

#mary

#loves

“Mary B.”

“female”

#name

#gender

- 10 -

Linked Data

• Open your data• Publish it in RDF, the lingua franca of the data web• Data first, schema second

– Worry about linking, data integration later… someone else can do it for you!

• Optionally, provide query access using the SPARQL query language and protocol– Powerful, SQL-like query language– HTTP or SOAP protocol to communicate with SPARQL servers

- 11 -

Linked Data cloud: interlinked RDF datasets on the Web

• http://linkeddata.org/

- 12 -

Dbpedia

• Dbpedia is dataset that contains much of the structured data in Wikipedia– Data from the info-boxes– Links between Wikipedia pages– Categories– Disambiguation and redirect pages

• Links to other datasets

- 13 -

Fetching individual resources

• Use your web browser• http://dbpedia.org/resource/Yahoo redirects to

http://dbpedia.org/page/Yahoo • You can plug in this URI into other Linked Data browsers

• HTTP GET to fetch data– Using curl: add Accept: application/rdf+xml for RDF and

enable redirect• curl -L -H 'Accept:application/rdf+xml'

'http://dbpedia.org/resource/Berlin’

• Data dumps– http://wiki.dbpedia.org/Datasets

- 14 -

Querying using SPARQL

• Interactive query builders• SPARQL Explorer: http://dbpedia.org/snorql/• Examples at: http://wiki.dbpedia.org/OnlineAccess

• Using HTTP GET– GET /sparql/?query=EncodedQuery HTTP/1.1 – Example:

• SELECT ?film ?x WHERE {

?film <http://dbpedia.org/ontology/language> <http://dbpedia.org/resource/French_language> . ?film <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Film>}

• curl 'http://dbpedia.org/sparql?query=encodedQuery’

- 15 -

More data

• New York Times– http://data.nytimes.com/– Example URI:

• http://data.nytimes.com/60694995023816375851

– Also supports JSON• Append .json or set Accept:text/javascript

• Freebase– http://freebase.com– Example URI

• http://rdf.freebase.com/rdf/en.tron_legacy

– Data dump• http://download.freebase.com

- 16 -

And more data…

• Geonames: open geo data– Geonames.org– http://sws.geonames.org/5130561/– Download:

• http://www.geonames.org/export/

• Open Government data efforts– Data.gov

• See apps e.g. http://flyontime.us

– Data.gov.uk• http://data.gov.uk/sparql

- 17 -

Spanish open gov’t data and linked data efforts

• Spanish open data efforts– La Asociación Española de Linked Data (AELID)

• http://aelid.es/

– Proyecto Aporta• aporta.es

– Regional/local efforts• risp.asturias.es (RDF, SPARQL)• datos.zaragoza.es (RDF, SPARQL)• opendata.euskadi.net (RDF)• dadesobertes.gencat.cat (RDF)

– Competition AbreDatos 2010• abredatos.es

- 18 -

More info

• Segaran et al.: Programming the Semantic Web, O’Reilly, 2010.

• linkeddata.org• W3C Semantic Web Activity

– Presentations, guides etc.

• RDF Primer– http://www.w3.org/TR/2004/REC-rdf-primer-20040210/

• SPARQL query language and protocol specs– http://www.w3.org/TR/rdf-sparql-protocol/– http://www.w3.org/TR/rdf-sparql-query/

• Search SlideShare etc. for more intro material

Build your Own Search Service

(BOSS)Peter Mika

Yahoo! Research Barcelona

pmika@yahoo-inc.com

- 20 -

Innovate with Search!

• It’s really simple…

• Example: – pay $0.0008 for a query, earn $0.01 per query– 100,000 users a day, each making 1 query a day– Earn $920 dollars a day!

- 21 -

Reminds me of the underpants gnomes from the Simpsons

• http://en.wikipedia.org/wiki/Underpants_Gnomes

- 22 -

Yahoo BOSS: Yahoo’s Search API

• Ability to re-order results and blend-in addition content• No restrictions on presentation• No branding or attribution• Access to multiple verticals (web search, image, news)• Spelling suggestions• 40+ supported language and region pairs• Pricing (BOSS)

– 10,000 free queries a day– Pay for more queries– Serve any ads you want

• For more info, http://developer.yahoo.com/search/boss/• New in BOSS v2

– Powered by Bing– Retrieve ads from Yahoo! and earn money ;)

- 24 -

Queries you can play with

• Yahoo!’s WebScope program – Data sharing with universities and research institutions – Some of the most exciting data that we have!– Request access online

• http://webscope.sandbox.yahoo.com/

– Requires approval by Department Chair

• For HackU, you can sign up here for access to a dataset containing real world user queries– Yahoo! Search Tiny Sample v1.0: a set of 4,500 queries– Ideal for testing and demonstrating your search-based apps– Can you really show something interesting for all these users?

top related