Top Banner
Mira Dontcheva Steven M. Drucker David Salesin Michael F. Cohen Oct. 2007 UIST '07 Lin Yen Ling 20080930 1/23
23
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 20080930

Mira Dontcheva Steven M. Drucker David Salesin Michael F. Cohen

Oct. 2007 UIST '07

Lin Yen Ling20080930

1/23

Page 2: 20080930

OUTLINEIntroductionRelated WorkThe Summaries FrameworkExample ScenarioSystem OverviewRetrieval Using RelationshipsAuthoring CardsTemplate-based SearchExploratory User StudyConclusions and Future Work

2/23

Page 3: 20080930

INTRODUCTIONTo Focuse on helping people interact and

gather Web content.Goal is to Lower the effort necessary for

collecting, organizing, managing, and sharing that content.

They present three new techniques that build on the existing summaries framework and interaction paradigms.

3/23

Page 4: 20080930

INTRODUCTIONThree new techniques:

An interaction technique that allows users to specify relations between websites and use these relations to automatically collect data from multiple websites.

An interface for merging content from multiple websites and organizing it visually.

To introduce a novel search paradigm for collecting content from the Web with search templates.

4/23

Page 5: 20080930

RELATED WORKManaging web content

WebBook(1996) 、 Data Mountain(1998) 、 TopicShop(2003)

Hunter Gatherer(2002) 、 Internet Scrapbook(1998)

Semantic Web communityPiggy Bank(2005) 、 Thresher(2005)

Summaries framework(2006)To give the user intercative tools for specifying

relations between disparate data sources.

5/23

Page 6: 20080930

RELATED WORKCollecting content using relations

Data integrationComplementary to database researchEnd-user programming for the Web.

Chickenfoot(2005) RecipeSheet(2006) C3W (2004) Marmite (2007) and Yahoo Pipes

Simple graphical interface for mixing content from different source.

Interactive layout editingSketch Pad a man-machine graphical communication

system. (1963) Inferring constraints from multiple snapshots. (1993) Inferring graphical constraints with rockit.(1993)Similar to commerical HTML editors.

6/23

Page 7: 20080930

RELATED WORKFormatting search results

Stuff I’ve Seen (2003)Clusty (2004): go one step further and cluster

the search results according to topic.Grokker (2005)To Go beyond clustering and reogranizing

URLs.

7/23

Page 8: 20080930

THE SUMMARIES FRAMEWORKBuilt on top of the summaries framework .Provide an interface for interactively creating

extraction patterns for webpages.Implemented as a browser extension.Written in Javascript and XUL.Use the extraction techniques that are

already part of summaries framework.DOM 、 Context-based rule

Focus on new application for the extracted content.

8/23

Page 9: 20080930

EXAMPLE SCENARIOShow the steps taken by a user as he looks

for a restaurant for a night out in Seattle.

9/23

Page 10: 20080930

EXAMPLE SCENARIO - Relations

10/23

Page 11: 20080930

EXAMPLE SCENARIO - Cards

11/23

Page 12: 20080930

EXAMPLE SCENARIO-Search templates

12/23

Page 13: 20080930

EXAMPLE SCENARIO-Search templates

13/23

Page 14: 20080930

SYSTEM OVERVIEWThe system includes

A data repository To holds all of the content collected by the user

according to the source webpage and semantic tags of the webpage elements.

A set of user-defined cards webpage elements within a relation tree should be

displayed and their visual arrangement.A set of search templates

To include a set of websites and possibly relations for those websites.

14/23

Page 15: 20080930

RETRIEVAL USING RELATIONSHIPSTo define a relation as a directed connection from

tagi from websiteA to tagj from websiteB.All relations are stored in the data repository and

are available to the user at any time.To define this process more formally, the execution

of a relation can be expressed as a database query. For a given relation r,

where r = websiteA.tagi → websiteB.tagj collect content for any new data record from

websiteA for tagi as a JOIN operation or the following SQL pseudo-query.SELECT * FROM websiteBWHERE websiteB.tagj = websiteA.tagi

15/23

Page 16: 20080930

RETRIEVAL USING RELATIONSHIPS (Query formulation)

To formulate the keyword query, we typically use only the extracted text content.We find that this type of query is usually sufficient and

returns the appropriate result within the top eight search results.

To reformulate the query using heuristics. We found this approach particularly effective for

situations in which something is described in multiple ways or is part of multiple categories.

Other approaches for reformulating queries include using the semantic tag associated with the webpage element or using additional webpage elements.

16/23

Page 17: 20080930

RETRIEVAL USING RELATIONSHIPS (Search result comparison)For each query we extract the first eight search

results and rank the extracted content according to similarity to the webpage content that triggered the query.

To compute similarity we compare the extracted webpage elements using the correspondence specified in the relation that triggered the search.

For example when collecting content for the “Ambrosia” restaurant from nwsource.com, the system issues the query “Ambrosia” limiting the results to the yelp.com domain.

17/23

Page 18: 20080930

RETRIEVAL USING RELATIONSHIPS (Limitations)To extract content from only eight search

results because the Google AJAX Search API limits the search results to a maximum of eight.

To handle these dynamic webpages, in subsequent work we hope to leverage research into macro recording systems such asWebVCR (2000),Turquoise (1997), Web Macros (1999), TrIAs (2000), PLOW (2006),and Creo (2006).

Madhavan et al.(2007)In the current implementation we allow the

user to specify only one-to-one relations.18/23

Page 19: 20080930

AUTHORING CARDSThe user can view his collection of Web

content through cards.Cards are persistent, can be reused, and

shared with others.It does not currently capabilies for specifying

interaction.Combine it with the Exbit API(2007).

19/23

Page 20: 20080930

AUTHORING CARDS

20/23

Page 21: 20080930

TEMPLATE-BASED SEARCH

21/23

Page 22: 20080930

EXPLORATORY USER STUDYFour graduate students and two were staff in

the university.Relations

To Explore exposing possible relations to the user as he collects new content.

CardsA good card designer should make it possible

to create quickly but also give the user control. Search Templates

To give the user feedback about the available search results.

22/23

Page 23: 20080930

CONCLUSIONS AND FUTURE WORKThis Work combines content extraction and Web

search to provide services and tools that are much needed and can help users with challenging information tasks.

Such a web of relationships can enable a new shift in Web applications and bring about a World Wide Web that is both more personal and collaborative.

They plan to continue evolving the card designer to provide light-weight card authoring for the novice.

They plan to explore approaches for providing more feedback so that the user can understand search results and quickly and easily iterate through queries.

They hope To explore which websites people relate together, how often they create new cards, and how well they can use search templates.

23/23