Top Banner
Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005
40

Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

Search@SIMSA metadata-based

approach

Marti HearstAssociate Professor

BT Visit August 18, 2005

Page 2: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

The Problem:

How to help people navigate and organize the world’s information?

Page 3: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

The SIMS Solution

Focus on METADATA

System Support for Structured Search

Search UserInterfaces

Cheshire

Flamenco

Community-basedMetadata Creation

MMM

Content Analysisfor Metadata

Creation

Mamba

Page 4: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Example: Search and Navigation of Large Collections

ImageCollections

E-GovernmentSites

Example: the University of California Library Catalog

Shopping Sites

Digital Libraries

Page 5: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 6: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 7: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 8: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

What do we want done differently?

• Organization of results• Hints of where to go next• Flexible ways to move around

• … How to structure the information?

Page 9: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

How to Structure Information for Search and Browsing?

• Hierarchy is too rigid

• KL-One is too complex

• Hierarchical faceted metadata:– A useful middle ground

Page 10: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

What are facets?• Sets of categories, each of which describe a

different aspect of the objects in the collection.• Each of these can be hierarchical.• (Not necessarily mutually exclusive nor

exhaustive, but often that is a goal.)

Time/Date Topic RoleGeoRegion

Page 11: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Facet example: Recipes

Course

Main Course

CookingMethod

Stir-fry

Cuisine

Thai

Ingredient

Red Bell Pepper

Curry

Chicken

Page 12: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

How to Put In an Interface?Some Challenges:

• Users don’t like new search interfaces.

• How to show lots of information without overwhelming or confusing?

Page 13: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

A Solution (The Flamenco Project)

• Use proper HCI methods.

• Organize search results according to the faceted metadata so navigation looks similar throughout

– Easy to see what to go next, were you’ve been

– Avoids empty result sets

– Integrates seamlessly with keyword search

Page 14: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Art History Images Collection

Page 15: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 16: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 17: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 18: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 19: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 20: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 21: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 22: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 23: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 24: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 25: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 26: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 27: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 28: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Usability Studies• Usability studies done on 3 collections:

– Recipes: 13,000 items– Architecture Images: 40,000 items– Fine Arts Images: 35,000 items

• Conclusions:– Users like and are successful with the

dynamic faceted hierarchical metadata, especially for browsing tasks

– Very positive results, in contrast with studies on earlier iterations.

Page 29: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Post-Test Comparison

15 16

2 30

1 29

   4 28

8 23

6 24

28 3

1 31

2 29

FacetedBaseline

Overall Assessment

More useful for your tasksEasiest to useMost flexible

More likely to result in dead endsHelped you learn more

Overall preference

Find images of rosesFind all works from a given period

Find pictures by 2 artists in same media

Which Interface Preferable For:

Page 30: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Cheshire: System Support forMetadata-based Search

• Cheshire is an XML/SGML Information Retrieval system using probabilistic relevance ranking

• Cheshire3 includes Grid-based data storage and processing support, permitting very large-scale databases and high efficiency while providing effective relevance ranked results

Page 31: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Cheshire• The system is currently in production use for

many JISC-funded national information services and projects in the UK including:– The Archives Hub– MerseyLibraries– Resource Discovery Network (RDN)– National Center for Text Mining (NaCTeM)

Page 32: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Mamba:Creating Classifications from Data

• Most approaches are associational– AKA clustering, LSA, LDA, etc.– This leads to poor results when applied to text

• To derive facets, need a different angle– We have a simple approach based on

WordNet

Page 33: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Example: Recipes (3500 docs)

Page 34: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Stoica & Hearst ’04WordNet-based

Page 35: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Stoica & Hearst ’04WordNet-based

Page 36: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Stoica & Hearst ’04WordNet-based

Page 37: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Our Approach• Leverage the structure of WordNet

Doc

umen

ts

WordNet

Get hypernym

paths

Sel

ect

ter

ms

Build tree

Compresstree

Page 38: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

A New Opportunity• Tagging, folksonomies

– (flickr de.lici.ous)– People are created facets in a decentralized manner– They are assigning multiple facets to items– This is done on a massive scale– This leads naturally to meaningful associations

Page 39: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Page 40: Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

Recap

• Organizing and Navigating Information is a huge IT opportunity

• Several research projects at SIMS tackle this with a special perspective: METADATA– System support for efficient search over structured

information– User interfaces using hierarchical faceted metadata– Community-based metadata creation– Automated analysis algorithms for metadata creation

Thank you!