Top Banner
RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006
56

RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Web 2.0, Tagging, Search engines, RawSugar

Frank SmadjaRawSugar

May 2006

Page 2: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

What is Web 2.0Tim O’Reilly:Web 2.0 is the network as platform, spanning all connected devices; Web 2.0 applications are those that make the

most of the intrinsic advantages of that platform: delivering software as a continually-updated service that gets better the more people use it, consuming and remixing data from multiple sources, including individual users, while providing their own data and services in a form that allows remixing by others, creating network effects through an "architecture of participation," and going beyond the page metaphor of Web 1.0 to deliver rich user experiences.

http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html

Page 3: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

What is Web 2.0?Social Web – “Wisdom of Crowds”

– Users are publishers– Network effect – SHARE - – e.g: blogger.com, flickr, youtube, del.icio.us, tadalist.com, i4giveu.com,

Technology:– Software delivery: Hours, Users are testers– AJAX (more later)– E.g.: 30Boxes, Writely, Google Calendar

Business model:– Free for users, Paid Advertisements– Share revenues with users– E.g., Google adsense, simpy, RawSugar– Pageviews => $$$$

Page 4: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Social Web – Wisdom of Crowds

(1) diversity of opinion

(2) independence of members from one another

(3) decentralization and

(4) a good method for aggregating opinions

Show: Digg amazon.com Yahoo! Movies

Page 5: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

What is Tagging?

From Gary Larson

Page 6: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Tagging Example

Page 7: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Before Tagging: Classification

• Too hard to classify• Too expensive• Not scalable

• Yahoo! directory• Dmoz• Semantic Web

Page 8: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Categorization is hard!!

Multiple concepts activated

Choose ONE of the activated concepts.

Categorize it!

Object worth

remembering (article, image…)

Analysis-Paralysis!

From Rashmi Sinha

Page 9: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Tagging is simpler

Multiple concepts

are activated

Tagit!

Note all concepts

Object worth

remembering (article, image…)

From Rashmi Sinha

Page 10: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

The Personal to the Social

From Rashmi Sinha

Page 11: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Tagging is a reality

• Bookmarkers tag:– Delicious, Rawsugar, Shadows, Simpy, Blinklist, …

• Bloggers tag:– 27 million blogs, doubles every 6 months– 1/3rd of blog posts now use tags (or categories)

• Many more:– BBC – news site– News - Digg– YouTube - Video– Flickr, photo publishing and tagging – Enterprise? Museums? Cell phones?

Most user generated content is tagged !

Page 12: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

What Tagging is NOT

– NOT: Generous and altruistic people classifying the Web for the sake of the community

– NOT: Smart software automatically classifying Web pages and tagging them

– NOT: A collaborative way to classify the web into a growing giant ontology (folksonomy)

Page 13: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

So why do People Tag?

– Recovery/sharing of personal information:• Bookmarks• Photos• Videos, etc.

– Increased traffic and findability• Bloggers

– Social reward – Advertisement $

Tagging brings value to the tagger

Page 14: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Why is Tagging successful?Semantic Web

Tagging

Who classifies

Publishers or Librarians

Everybody, consumers

Controlled vocabulary

Yes No

Imposed structure

Yes No

Classification cost

High Free

Recovery NA Yes

Searchability Low Medium

Navigation High Medium

•Tagging is free•Tagging is easy•Tagging brings value

[Marlow, Naaman, Boyd & Davis 2006]

Page 15: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

RawSugar

• Covers the last mile of search• Provides Guided Search on tagged pages• Publish guided search

– Provide guided search to your site, Blog– Get more traffic – Receive advertising revenues!

Search and Explore – Navigate by topics, people, directories– Find Experts

Page 16: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Nothing to eat here!

Page 17: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Still no food here !

Page 18: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Bingo !

Page 19: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

What’s Great What’s not Great ?

• Great: – You know what you’re looking for:

• “Zibibbo restaurant” -

• Not so great:– You’re hungry !– You want to browse - Discover information, explore.– You want to know what is popular (“restaurants,

digital camera, Java Tutorial, Free Games, etc.”)

Page 20: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

State of the art:The Last Mile of Search

• 83% unhappy with search results (WSJ survey)– Most searches point to a list of content websites and directories– Navigation of these sites is cumbersome and tedious

• Google 2 steps approach:– Search “restaurants”– While (true) { explore guide; }– Change the query and Repeat

“The last mile of search” Examples:Digital CameraPalo Alto bikeDaily Kos Sprol dot Com

Page 21: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Where is the last mile?

Google stops here:

Human Knowledge:• Small and mid-size websites and blogs • Content is organized by human and manually:

– Categorization

– recommendations • Poor search and navigation• Each directory is an island of information and

does not connect to related directories

Page 22: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

What’s Missing?Browsing with Facets

“Easy to discover information without prior knowledge of collection contents “

Faceted Search Paradigm

Not new:• Library systems: “American history”, “Shakespeare”, etc.• Search Engines: Endeca, Shopping.com, Yahoo! Directories, Dmoz, etc.• Google/MSN/Yahoo! Local Search - Browse by Location -• Current uses: E-Commerce

Problems:• Maintained by humans – Expensive• Rely on a world order – Brittle • Facets use a controlled vocabulary – Not easy to define.

=> Not Scalable

Page 23: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Amazon – Faceted SearchSearch for Tel Aviv

Page 24: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Shopping.com Faceted SearchSearch for Tel Aviv

Page 25: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

RawSugar Faceted Search

Refine your search

Page 26: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

RawSugar Faceted Search

Juniorbonner on del.icio.us vs. Juniorbonner on RawSugar

Page 27: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

RawSugar Into the Last Mile

RawSugar inside

Page 28: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

RawSugar Into the Last Mile

RawSugar inside

Page 29: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

RawSugar Faceted Search in the last mile

Daily Kos Blog

Search for Iran on RawSugar

Page 30: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

RawSugar Technology

Page 31: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Problem 1:Searching the TagSpace

Tags: Ikura, Uni, Ebi, Sushi, Nigiri, Japanese food, lunch in Tokyo, Ezobafun-uni, Kitamurashiuni, Murasakiuni, Akazaebi, Tenagaebi, etc.

How wouldYou tag this?

How wouldYou searchFor it?

Page 32: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Problem 2: Exploring the TagSpace

morphology

Locations

Restaurant Type

Not a restaurant!

Page 33: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Problem 3: Exploring the TagSpace

Not usable !

Page 34: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

RawSugar – Tag HierarchyGuided Navigation

Food groups

Locations groups

Origins groups

Page 35: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

RawSugar Tag Hierarchy

• Key idea: Some users (4%) define tag hierarchies – (food>sushi, european>spanish, …)

• We mine this tag space to learn simple tag-relations (ISA relations and RELATED) using statistics.

• At search time: We apply this learned knowledge to group tags from results.

Page 36: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

RawSugar –Guided Search Combining Hierarchy Fragments

europe

UK

Scotland

Edinburgh

Spain

Italy

food

vegetarian

Sushi

food

cooking

recipes

Asian

Chinese

Thai

Southwest

California

Bay Area

San Francisco

Texas

User 1

User 2

User 3

User4

User 5

Page 37: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

RawSugar: Mining and Clustering

• Related tags: Tags that are related – (collocations, synonymy, antinomy, ISA, HASA, …)

• Related pages: Pages tagged similarly

• Related people: People with similar interests

Tags

Pages

People

RawSugar TagSpace

sailing

Cyclin

g group

Page 38: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Related workRashmi Sinha: “Tag Sorting: Another tool in an information architect's toolbox” http://www.rashmisinha.com/archives/05_02/tag-sorting.html

Emanuele Quintarelli: “Hierarchical taxonomies from flat tag spaces” http://www.infospaces.it/wordpress/topics/information-architecture/91

Paul Heyman (Stanford): “Tag Hierarchies” http://i.stanford.edu/~heymann/taghierarchy.html

Brooks, Montanez, University of San Francisco: “Improved Annotation of the Blogosphere via Autotagging and Hierarchical Clustering ” http://www.cs.usfca.edu/~brooks/papers/brooks-montanez-www06.pdf

Siderean fac.etio.us: “Faceted search on delicious tags” http://www.siderean.com/delicious/facetious.jsp

Marti Hearst: “Clustering vs. Faceted Search” http://bailando.sims.berkeley.edu/papers/cacm06.pdf

And more …

Page 39: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Conclusion

Questions?

Page 40: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Backup Technology Slides

Page 41: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

What should we do?Smart Backend – Easy Tagging“Tag Relations improve searchability and exploration.”

Similar tags:• Spelling and morphology: macos<->mac_os<->mac os; tagging <-> tags <->tagged,• Synonyms: macos <-> tiger; films <-> movies; new york <-> nyc; • Related: cooking <-> recipes, software development <-> programming,

Tag groups or subtags:•Location -> san francisco, london, new york, etc.•Food -> sushi, sashimi, pizza, etc.•Programming -> html, java, css, etc.

Goal : Discover them by Mining the tag space

Page 42: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

What should we do?Smart Backend – Friendly Frontend

• Backend should not dictate Frontend (Patrick Schmitz, Berkeley/Yahoo!)

•Smart processing is done by the backend under the hood.

• Tagging should be as effortless as possible, assisted but not automatic. Fight Analysis-Paralysis (Rashmi Sinha)

• Systems should be built to incite people to tag. Bring Value to the tagger

Page 43: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

What is Missing? Tag relations

“Tag Relations improve searchability and exploration.”

Similar tags:• Spelling and morphology: macos<->mac_os<->mac os; tagging <-> tags <->tagged,• Synonyms: macos <-> tiger; films <-> movies; new york <-> nyc; • Related: cooking <-> recipes, software development <-> programming,

Tag groups or subtags:•Location -> san francisco, london, new york, etc.•Food -> sushi, sashimi, pizza, etc.•Programming -> html, java, css, etc.

Goal : Discover them by Mining the tag space

Page 44: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Flickr – Clusters

Page 45: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Clustering – Step 1Similarity among tags

Page 46: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Some good Clusters found

Page 47: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Tags that belong to the same clusters -

Page 48: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Dmoz – World Order

Page 49: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Dmoz – World Order

Page 50: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Recommendations: dpreview

Page 51: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Faceted Search on TagSpaceChallenges

• Faceted search paradigm on the TagSpace:– Not a controlled environment– Large scale (1 facet for every 5 documents)– Lots of noise: search, search engine, google,

search_engines, searchengine, searchengines, search_engine, engine, web, internet, tools, reference, news, information, portal, engines, searching, tech, buscadores, tool …

Page 52: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Faceted Search on TagSpaceChallenges

How to rank facets? What facets should be displayed? How to show them?

• Performance: Reduce the search space - • Refining facets: Tags that allow the user to

refine (reduce) the search (depth)• Related facets: Tags that allow the user to

explore (breadth)• Group facets: Cluster tags that are related -

Page 53: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Before RawSugar

Page 54: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

With RawSugar

navigation

Otherusers

Page 55: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

Searching the TagSpace with RawSugar: Suggestion Engine

Goals:- Ease of tagging- Cohesiveness of our tagspace. Attempts to have our users re-use the same tags instead of creating

infinite variations. (search engines, searchengine, search, search tools, search sites, etc.)

Key Ideas :- Always suggest first the most popular tags- Use tag hierarchy and tag context to find the most relevant tags.- Use information on the user and the other users to refine the suggestions.

Page 56: RawSugar Web 2.0, Tagging, Search engines, RawSugar Frank Smadja RawSugar May 2006.

RawSugar

What’s Missing?Human Meta Knowledge

Is it good or no? What is it about? Is it popular?

Not new:• Guides: paloaltoonline.com, expedia.com, etc..• Review Sites - Zagat.com, dpreview.com, etc.• Shopping sites – shopping.com, Amazon,

Problems:• Limited to small environments or verticals (digital camera,

restaurants, etc.)• Not real search across sites -• Manpower – hiring, training, etc.

=> Not Scalable