Mira Dontcheva Steven M. Drucker David Salesin Michael F. Cohen Oct. 2007 UIST '07 Lin Yen Ling 20080930 1/23
Mira Dontcheva Steven M. Drucker David Salesin Michael F. Cohen
Oct. 2007 UIST '07
Lin Yen Ling20080930
1/23
OUTLINEIntroductionRelated WorkThe Summaries FrameworkExample ScenarioSystem OverviewRetrieval Using RelationshipsAuthoring CardsTemplate-based SearchExploratory User StudyConclusions and Future Work
2/23
INTRODUCTIONTo Focuse on helping people interact and
gather Web content.Goal is to Lower the effort necessary for
collecting, organizing, managing, and sharing that content.
They present three new techniques that build on the existing summaries framework and interaction paradigms.
3/23
INTRODUCTIONThree new techniques:
An interaction technique that allows users to specify relations between websites and use these relations to automatically collect data from multiple websites.
An interface for merging content from multiple websites and organizing it visually.
To introduce a novel search paradigm for collecting content from the Web with search templates.
4/23
RELATED WORKManaging web content
WebBook(1996) 、 Data Mountain(1998) 、 TopicShop(2003)
Hunter Gatherer(2002) 、 Internet Scrapbook(1998)
Semantic Web communityPiggy Bank(2005) 、 Thresher(2005)
Summaries framework(2006)To give the user intercative tools for specifying
relations between disparate data sources.
5/23
RELATED WORKCollecting content using relations
Data integrationComplementary to database researchEnd-user programming for the Web.
Chickenfoot(2005) RecipeSheet(2006) C3W (2004) Marmite (2007) and Yahoo Pipes
Simple graphical interface for mixing content from different source.
Interactive layout editingSketch Pad a man-machine graphical communication
system. (1963) Inferring constraints from multiple snapshots. (1993) Inferring graphical constraints with rockit.(1993)Similar to commerical HTML editors.
6/23
RELATED WORKFormatting search results
Stuff I’ve Seen (2003)Clusty (2004): go one step further and cluster
the search results according to topic.Grokker (2005)To Go beyond clustering and reogranizing
URLs.
7/23
THE SUMMARIES FRAMEWORKBuilt on top of the summaries framework .Provide an interface for interactively creating
extraction patterns for webpages.Implemented as a browser extension.Written in Javascript and XUL.Use the extraction techniques that are
already part of summaries framework.DOM 、 Context-based rule
Focus on new application for the extracted content.
8/23
EXAMPLE SCENARIOShow the steps taken by a user as he looks
for a restaurant for a night out in Seattle.
9/23
EXAMPLE SCENARIO - Relations
10/23
EXAMPLE SCENARIO - Cards
11/23
EXAMPLE SCENARIO-Search templates
12/23
EXAMPLE SCENARIO-Search templates
13/23
SYSTEM OVERVIEWThe system includes
A data repository To holds all of the content collected by the user
according to the source webpage and semantic tags of the webpage elements.
A set of user-defined cards webpage elements within a relation tree should be
displayed and their visual arrangement.A set of search templates
To include a set of websites and possibly relations for those websites.
14/23
RETRIEVAL USING RELATIONSHIPSTo define a relation as a directed connection from
tagi from websiteA to tagj from websiteB.All relations are stored in the data repository and
are available to the user at any time.To define this process more formally, the execution
of a relation can be expressed as a database query. For a given relation r,
where r = websiteA.tagi → websiteB.tagj collect content for any new data record from
websiteA for tagi as a JOIN operation or the following SQL pseudo-query.SELECT * FROM websiteBWHERE websiteB.tagj = websiteA.tagi
15/23
RETRIEVAL USING RELATIONSHIPS (Query formulation)
To formulate the keyword query, we typically use only the extracted text content.We find that this type of query is usually sufficient and
returns the appropriate result within the top eight search results.
To reformulate the query using heuristics. We found this approach particularly effective for
situations in which something is described in multiple ways or is part of multiple categories.
Other approaches for reformulating queries include using the semantic tag associated with the webpage element or using additional webpage elements.
16/23
RETRIEVAL USING RELATIONSHIPS (Search result comparison)For each query we extract the first eight search
results and rank the extracted content according to similarity to the webpage content that triggered the query.
To compute similarity we compare the extracted webpage elements using the correspondence specified in the relation that triggered the search.
For example when collecting content for the “Ambrosia” restaurant from nwsource.com, the system issues the query “Ambrosia” limiting the results to the yelp.com domain.
17/23
RETRIEVAL USING RELATIONSHIPS (Limitations)To extract content from only eight search
results because the Google AJAX Search API limits the search results to a maximum of eight.
To handle these dynamic webpages, in subsequent work we hope to leverage research into macro recording systems such asWebVCR (2000),Turquoise (1997), Web Macros (1999), TrIAs (2000), PLOW (2006),and Creo (2006).
Madhavan et al.(2007)In the current implementation we allow the
user to specify only one-to-one relations.18/23
AUTHORING CARDSThe user can view his collection of Web
content through cards.Cards are persistent, can be reused, and
shared with others.It does not currently capabilies for specifying
interaction.Combine it with the Exbit API(2007).
19/23
AUTHORING CARDS
20/23
TEMPLATE-BASED SEARCH
21/23
EXPLORATORY USER STUDYFour graduate students and two were staff in
the university.Relations
To Explore exposing possible relations to the user as he collects new content.
CardsA good card designer should make it possible
to create quickly but also give the user control. Search Templates
To give the user feedback about the available search results.
22/23
CONCLUSIONS AND FUTURE WORKThis Work combines content extraction and Web
search to provide services and tools that are much needed and can help users with challenging information tasks.
Such a web of relationships can enable a new shift in Web applications and bring about a World Wide Web that is both more personal and collaborative.
They plan to continue evolving the card designer to provide light-weight card authoring for the novice.
They plan to explore approaches for providing more feedback so that the user can understand search results and quickly and easily iterate through queries.
They hope To explore which websites people relate together, how often they create new cards, and how well they can use search templates.
23/23