20080930

Mira Dontcheva Steven M. Drucker David Salesin Michael F. Cohen

Oct. 2007 UIST '07

Lin Yen Ling20080930

1/23

OUTLINEIntroductionRelated WorkThe Summaries FrameworkExample ScenarioSystem OverviewRetrieval Using RelationshipsAuthoring CardsTemplate-based SearchExploratory User StudyConclusions and Future Work

2/23

INTRODUCTIONTo Focuse on helping people interact and

gather Web content.Goal is to Lower the effort necessary for

collecting, organizing, managing, and sharing that content.

They present three new techniques that build on the existing summaries framework and interaction paradigms.

3/23

INTRODUCTIONThree new techniques:

An interaction technique that allows users to specify relations between websites and use these relations to automatically collect data from multiple websites.

An interface for merging content from multiple websites and organizing it visually.

To introduce a novel search paradigm for collecting content from the Web with search templates.

4/23

RELATED WORKManaging web content

WebBook(1996) 、 Data Mountain(1998) 、 TopicShop(2003)

Hunter Gatherer(2002) 、 Internet Scrapbook(1998)

Semantic Web communityPiggy Bank(2005) 、 Thresher(2005)

Summaries framework(2006)To give the user intercative tools for specifying

relations between disparate data sources.

5/23

RELATED WORKCollecting content using relations

Data integrationComplementary to database researchEnd-user programming for the Web.

Chickenfoot(2005) RecipeSheet(2006) C3W (2004) Marmite (2007) and Yahoo Pipes

Simple graphical interface for mixing content from different source.

Interactive layout editingSketch Pad a man-machine graphical communication

system. (1963) Inferring constraints from multiple snapshots. (1993) Inferring graphical constraints with rockit.(1993)Similar to commerical HTML editors.

6/23

RELATED WORKFormatting search results

Stuff I’ve Seen (2003)Clusty (2004): go one step further and cluster

the search results according to topic.Grokker (2005)To Go beyond clustering and reogranizing

URLs.

7/23

THE SUMMARIES FRAMEWORKBuilt on top of the summaries framework .Provide an interface for interactively creating

extraction patterns for webpages.Implemented as a browser extension.Written in Javascript and XUL.Use the extraction techniques that are

already part of summaries framework.DOM 、 Context-based rule

Focus on new application for the extracted content.

8/23

EXAMPLE SCENARIOShow the steps taken by a user as he looks

for a restaurant for a night out in Seattle.

9/23

EXAMPLE SCENARIO - Relations

10/23

EXAMPLE SCENARIO - Cards

11/23

EXAMPLE SCENARIO-Search templates

12/23

EXAMPLE SCENARIO-Search templates

13/23

SYSTEM OVERVIEWThe system includes

A data repository To holds all of the content collected by the user

according to the source webpage and semantic tags of the webpage elements.

A set of user-defined cards webpage elements within a relation tree should be

displayed and their visual arrangement.A set of search templates

To include a set of websites and possibly relations for those websites.

14/23

RETRIEVAL USING RELATIONSHIPSTo define a relation as a directed connection from

tagi from websiteA to tagj from websiteB.All relations are stored in the data repository and

are available to the user at any time.To define this process more formally, the execution

of a relation can be expressed as a database query. For a given relation r,

where r = websiteA.tagi → websiteB.tagj collect content for any new data record from

websiteA for tagi as a JOIN operation or the following SQL pseudo-query.SELECT * FROM websiteBWHERE websiteB.tagj = websiteA.tagi

15/23

RETRIEVAL USING RELATIONSHIPS (Query formulation)

To formulate the keyword query, we typically use only the extracted text content.We find that this type of query is usually sufficient and

returns the appropriate result within the top eight search results.

To reformulate the query using heuristics. We found this approach particularly effective for

situations in which something is described in multiple ways or is part of multiple categories.

Other approaches for reformulating queries include using the semantic tag associated with the webpage element or using additional webpage elements.

16/23

RETRIEVAL USING RELATIONSHIPS (Search result comparison)For each query we extract the first eight search

results and rank the extracted content according to similarity to the webpage content that triggered the query.

To compute similarity we compare the extracted webpage elements using the correspondence specified in the relation that triggered the search.

For example when collecting content for the “Ambrosia” restaurant from nwsource.com, the system issues the query “Ambrosia” limiting the results to the yelp.com domain.

17/23

RETRIEVAL USING RELATIONSHIPS (Limitations)To extract content from only eight search

results because the Google AJAX Search API limits the search results to a maximum of eight.

To handle these dynamic webpages, in subsequent work we hope to leverage research into macro recording systems such asWebVCR (2000),Turquoise (1997), Web Macros (1999), TrIAs (2000), PLOW (2006),and Creo (2006).

Madhavan et al.(2007)In the current implementation we allow the

user to specify only one-to-one relations.18/23

AUTHORING CARDSThe user can view his collection of Web

content through cards.Cards are persistent, can be reused, and

shared with others.It does not currently capabilies for specifying

interaction.Combine it with the Exbit API(2007).

19/23

AUTHORING CARDS

20/23

TEMPLATE-BASED SEARCH

21/23

EXPLORATORY USER STUDYFour graduate students and two were staff in

the university.Relations

To Explore exposing possible relations to the user as he collects new content.

CardsA good card designer should make it possible

to create quickly but also give the user control. Search Templates

To give the user feedback about the available search results.

22/23

CONCLUSIONS AND FUTURE WORKThis Work combines content extraction and Web

search to provide services and tools that are much needed and can help users with challenging information tasks.

Such a web of relationships can enable a new shift in Web applications and bring about a World Wide Web that is both more personal and collaborative.

They plan to continue evolving the card designer to provide light-weight card authoring for the novice.

They plan to explore approaches for providing more feedback so that the user can understand search results and quickly and easily iterate through queries.

They hope To explore which websites people relate together, how often they create new cards, and how well they can use search templates.

23/23