Top Banner
Semantic markup with schema.org : helping search engines understand the Web PRESENTED BY Peter Mika, Director of Research, Yahoo Labs March 26, 2015
41

Semantic mark-up with schema.org: helping search engines understand the Web

Jul 16, 2015

Download

Internet

Peter Mika
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Semantic mark-up with schema.org: helping search engines understand the Web

S e m a n t i c m a r k u p w i t h s c h e m a . o r g :

h e l p i n g s e a r c h e n g i n e s u n d e r s t a n d t h e We b

P R E S E N T E D B Y P e t e r M i k a , D i r e c t o r o f R e s e a r c h , Y a h o o L a b s ⎪ M a r c h 2 6 , 2 0 1 5

Page 2: Semantic mark-up with schema.org: helping search engines understand the Web

Real problem

Page 3: Semantic mark-up with schema.org: helping search engines understand the Web

What it’s like to be a machine?

Roi Blanco

Page 4: Semantic mark-up with schema.org: helping search engines understand the Web

What it’s like to be a machine?

↵⏏☐ģ

✜Θ♬♬ţğ√∞§®ÇĤĪ✜★♬☐✓✓ţğ★✜

✪✚✜ΔΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫

≠=⅚©§★✓♪ΒΓΕ℠

✖Γ♫⅜±⏎↵⏏☐ģğğğμλκσςτ⏎⌥°¶§ΥΦΦΦ✗✕☐

Page 5: Semantic mark-up with schema.org: helping search engines understand the Web

What can we do?

5

Improve Information Retrieval

› Harder and harder given the same data

• Exploited term-based relevance models, hyperlink structure and interaction data

• Combination of features using machine learning

• Heavy investment in computational power

– real-time indexing, instant search, datacenters and edge services

Improve the Web

› Make the Web more searchable?

Page 6: Semantic mark-up with schema.org: helping search engines understand the Web

The Semantic Web (2001-)

3/27/20156

Part of Tim Berners-Lee’s original proposal for the Web

Beginning of a research community

› Formal ontology

› Logical reasoning

› Agents, web services

Rough start in deployment

› Misplaced expectations

› Lack of adoption

Page 7: Semantic mark-up with schema.org: helping search engines understand the Web

The Semantic Web, May 2001

“At the doctor's office, Lucy instructed her Semantic Web agent through her handheld Web browser. The agent promptly retrieved information about Mom's prescribed treatment from the doctor's agent, looked up several lists of providers, and checked for the ones in-planfor Mom's insurance within a 20-mile radius of her home and with a rating of excellent or very good on trusted rating services. It then began trying to find a match between available appointment times (supplied by the agents of individual providers through their Web sites) and Pete's and Lucy's busy schedules.”

(The emphasized keywords indicate terms whose semantics, or meaning, were defined for the agent through the Semantic Web.)

3/27/20157

Misplaced expectations?

Page 8: Semantic mark-up with schema.org: helping search engines understand the Web

Lack of adoption

Standardization ahead of adoption

› URI, RDF, RDF/XML, RDFa, JSON-LD,

OWL, RIF, SPARQL, OWL-S, POWDER …

Chicken and egg problem

› No users/use cases, hence no data

› No data, because no users/use cases

By 2007, some modest progress

› Metadata in HTML: microformats

› Linked Data: simplifying the stack

Page 9: Semantic mark-up with schema.org: helping search engines understand the Web

Microsearch internal prototype (2007)

Personal and

private

homepage

of the same

person

(clear from the

snippet but it

could be also

automatically

de-duplicated)

Conferences

he plans to attend

and his vacations

from homepage

plus bio events

from LinkedIn

Geolocation

Page 10: Semantic mark-up with schema.org: helping search engines understand the Web

Yahoo SearchMonkey (2008)

1. Extract structured data

› Semantic Web markup

• Example:

<span property=“vcard:city”>Santa Clara</span>

<span property=“vcard:region”>CA</span>

› Information Extraction

2. Presentation

› Fixed presentation templates

• One template per object type

› Applications

• Third-party modules to display data (SearchMonkey)

Page 11: Semantic mark-up with schema.org: helping search engines understand the Web

Effectiveness of enhanced results

Explicit user feedback

› Side-by-side editorial evaluation (A/B testing)

• Editors are shown a traditional search result and enhanced result for the same page

• Users prefer enhanced results in 84% of the cases and traditional results in 3% (N=384)

Implicit user feedback

› Click-through rate analysis

• Long dwell time limit of 100s (Ciemiewicz et al. 2010)

• 15% increase in ‘good’ clicks

› User interaction model

• Enhanced results lead users to relevant documents (IV) even though less likely to clicked than textual (III)

• Enhanced results effectively reduce bad clicks!

See

› Kevin Haas, Peter Mika, Paul Tarjan, Roi Blanco: Enhanced results for web search. SIGIR 2011: 725-734

Page 12: Semantic mark-up with schema.org: helping search engines understand the Web

Other applications of enhanced results

Google Rich Snippets - June, 2009› Faceted search for recipes - Feb, 2011

Bing tiles – Feb, 2011

Facebook’s Like button and the Open Graph Protocol (2010)› Shows up in profiles and news feed

› Site owners can later reach users who have liked an object

Twitter cards (2012)› More visual/interactive tweets

Page 13: Semantic mark-up with schema.org: helping search engines understand the Web

Other types of applications: vertical search

14

Page 15: Semantic mark-up with schema.org: helping search engines understand the Web

Problem!

16

Each of these applications require a different markup

› Different schemas and syntax

What’s a publisher to do?

› Mark up the same content differently for every consumer

• Time consuming

• Error prone

Page 16: Semantic mark-up with schema.org: helping search engines understand the Web

schema.org

Collaborative effort sponsored by large consumers of Web data

› Bing, Google, and Yahoo! as initial founders (June, 2011)

› Yandex joins schema.org in Nov, 2011

Agreement on a shared set of schemas for the Web

› Available at schema.org in HTML and machine readable formats

› Free to use under W3C Royalty Free terms

Page 17: Semantic mark-up with schema.org: helping search engines understand the Web

Example

18

Page 18: Semantic mark-up with schema.org: helping search engines understand the Web

View source

19

Page 19: Semantic mark-up with schema.org: helping search engines understand the Web
Page 20: Semantic mark-up with schema.org: helping search engines understand the Web

View source

Page 21: Semantic mark-up with schema.org: helping search engines understand the Web

schema.org structure

Classes

› Each class has a label and descriptions

› Classes form a class hierarchy

• Multiple inheritance allowed but rare (a class with two super-classes)

Properties

› Each property has a label and description

› Properties have domains and ranges, and inverse properties

Datatypes

› Boolean, Date, DateTime etc.

Page 22: Semantic mark-up with schema.org: helping search engines understand the Web

schema.org usage in practice

Depends on the skillset of the publisher

› Instances are rarely given an identifier, or identified by the URL of the webpage

› schema.org consumers (validators etc.) are tolerant to mistakes

• e.g. accept text even when an object is required

Driven by applications

› Publishers often provide the minimal information required in a particular context

› Validators (Bing, Google, Yandex) validate different subsets

Page 23: Semantic mark-up with schema.org: helping search engines understand the Web

schema.org statistics

R.V. Guha: Light at the end of the tunnel (ISWC 2013 keynote)

› Over 15% of all pages now have schema.org markup

› Over 5 million sites, over 25 billion entity references

› In other words

• Same order of magnitude as the web

See also

› P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus, LDOW 2012

• Based on Bing US corpus

• 31% of webpages, 5% of domains contain some metadata

› WebDataCommons

• Based on CommonCrawl Nov 2013

• 26% of webpages, 14% of domains contain some metadata

Page 24: Semantic mark-up with schema.org: helping search engines understand the Web

schema.org process

Process

› Initial release

• Group of experts harmonizing existing vocabularies

› Regular updates based on public discussion

• Fixes

• Extensions

• Deprecation

– almost never

Tooling

› Website (App Engine)

• Open Source

› Github

Page 26: Semantic mark-up with schema.org: helping search engines understand the Web
Page 27: Semantic mark-up with schema.org: helping search engines understand the Web
Page 28: Semantic mark-up with schema.org: helping search engines understand the Web

Extensions

External proposals integrated › News (IPTC)

› e-Commerce (GoodRelations)

› TV/Radio fixes (BBC/EBU's)

› Content Accessibility (a11ymetadata.org, IMS)

› Not-for-profit Offers (BibExtend)

› Question/Answer (StackExchange, Drupal)

Further integration › Automotive

› GS1

New extension mechanism› Coming soon

Page 29: Semantic mark-up with schema.org: helping search engines understand the Web

schema.org and web standards

schema.org builds on Semantic Web standards

› RDFa, JSON-LD, HTML5 microdata

Not a standardization effort in the classical sense

› Continuously evolving ontology

› Huge scope (‘everything on the Web’)

› Shallow depths compared to more targeted efforts

More specialized discussions typically at more targeted forums

› e.g. W3C Community Groups

Large enumerations and/or rapidly changing knowledge maintained elsewhere

› e.g. PlaceOfWorship

› BuddhistTemple, CatholicChurch, Church, HinduTemple, Mosque, Synagogue …

› Meanwhile over at Wikipedia:

• https://en.wikipedia.org/wiki/Place_of_worship

• https://www.wikidata.org/wiki/Q1370598

Page 30: Semantic mark-up with schema.org: helping search engines understand the Web
Page 31: Semantic mark-up with schema.org: helping search engines understand the Web

BibExtend Community Group (W3C)

Page 32: Semantic mark-up with schema.org: helping search engines understand the Web
Page 33: Semantic mark-up with schema.org: helping search engines understand the Web
Page 34: Semantic mark-up with schema.org: helping search engines understand the Web

What’s new?

Page 35: Semantic mark-up with schema.org: helping search engines understand the Web

Task completion

36

We would like to help our users in task completion

› But we have trained our users to talk in nouns

• Retrieval performance decreases by adding verbs to queries

› We need to understand what the available actions are

Schema.org Actions

› Describe what actions can be taken on a page/email

› See blog post and overview article

THING

THING

Page 36: Semantic mark-up with schema.org: helping search engines understand the Web

Actions

Schema.org v1.2 (April, 2014)

› See blog post and overview article for detail.

› and public-vocabs threads for even more details.

Page 37: Semantic mark-up with schema.org: helping search engines understand the Web
Page 38: Semantic mark-up with schema.org: helping search engines understand the Web

{

"@type": "Product",

"url": "http://example.com/products/ipod",

"potentialAction": {

"@type": "BuyAction",

"target": {

"@type": "EntryPoint",

"urlTemplate": "https://example.com/products/ipod/buy",

"encodingType": "application/ld+json",

"contentType": "application/ld+json"

},

"result": {

"@type": "Order",

"url-output": "required",

"confirmationNumber-output": "required",

"orderNumber-output": "required",

"orderStatus-output": "required"

}

}

}

{

"@type": "BuyAction",

"actionStatus": "CompletedActionStatus",

"object":

"https://example.com/products/ipod",

"result": {

"@type": "Order",

"url":

"http://example.com/orders/1199334"

"confirmationNumber": "1ABBCDDF23234",

"orderNumber": "1199334",

"orderStatus": "PROCESSING"

},

}

Actions example Here is a Product and

a potential action

(Buy)

After POSTing the

request to the

EntryPoint, here is

your completed action

Page 39: Semantic mark-up with schema.org: helping search engines understand the Web

Interactive search results (Yandex Islands)

40

Page 40: Semantic mark-up with schema.org: helping search engines understand the Web

(Possible) example: quick unsubscribe

41

How do I

unsubscribe?

Not very visible to

humans…

Page 41: Semantic mark-up with schema.org: helping search engines understand the Web

Q&A

Many thanks to

› The schema.org group and the many contributors to schema.org

› Dan Brickley

Get involved

› Join the discussion at [email protected]

› File a bug, fork a schema, track releases at Github.org

Contact me

[email protected]

› @pmika

› http://www.slideshare.net/pmika/