Making Mashups with Marmite

Post on 31-Dec-2015

23 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Making Mashups with Marmite. Jeff Wong Jason I. Hong Carnegie Mellon University. The Big Picture Problem. Lots of content out there on the web But not always in a form amenable to your needs Ex. Easy to get a list of hotels in San Jose, not so easy to sort by distance to convention center - PowerPoint PPT Presentation

Transcript

Making Mashups with Marmite

Jeff WongJason I. Hong

Carnegie Mellon University

The Big Picture Problem

• Lots of content out there on the web– But not always in a form amenable to your needs

– Ex. Easy to get a list of hotels in San Jose, not so easy to sort by distance to convention center

• Two observations:– In many cases, all of the data and services people need

already exist, but not connected together

– Unlikely that a web site can predict all possible needs

A Solution: Mashups

• Rapidly growing community of users creating “mashups” combining content from multiple web sites– Ex. Housingmaps.com

A Solution: Mashups

• Rapidly growing community of users creating “mashups” combining content from multiple web sites– Ex. Housingmaps.com

– Ex. MySpace child predators

– Ex. Friendster locations

– Ex. Most popular videos on YouTube, Yahoo Video, …

A Solution: Mashups

• Rapidly growing community of users creating “mashups” combining content from multiple web sites– Ex. Housingmaps.com

– Ex. MySpace child predators

– Ex. Friendster locations

– Ex. Most popular videos on YouTube, Yahoo Video, …

• ProgrammableWeb.com statistics– ~1500 mashups created since April 2005

– 356 open web-based APIs available

But Creating Mashups is Hard

• Requires lots of skill to create a mashup– Ex. Housingmaps creator has PhD in computer science

– Ex. MySpace child predator list took months

• Requires programming expertise in many areas– Web crawling

– Text parsing

– Pattern matching

– Databases

– HTML

MarmiteEnd-User Programming for Mashups

• Main idea: make it easy to create web mashups

• Use a dataflow approach connecting small operators– Inspired by Unix pipes and Apple’s Automator

• Example:– Get all events from Upcoming.org

– Filter out events that are too old

– Put them all onto a map

• Runs inside of a standard web browser

Set of Operators

Data Flow View

Data View

Using Marmite (Envisioned)

• Extract content from one or more web pages – names, addresses, dates, phone #, URLs

• Process it in a data flow manner– filtering out values or adding metadata

– integrating with other data sources (similar to a database join operation)

• Direct the output to a variety of sinks– databases, map services, text files, visualizations, web

pages, or source code that can be further edited

Marmite

• Motivation and Examples• Features and Design Rationale• User Evaluation

Features and Design Rationale

• Conducted a series of quick evaluations to understand design space and potential problems– Automator

– Lo-fi prototypes

Automator

Informal Automator Evaluation

• Had three novices try three simple web-based tasks– Warm-up task

– Traverse a set of web pages

– Download a set of images

• Some findings:– Some difficulties knowing how to start and what to do next

– Little feedback about state of system between operations

– Difficult to iterate due to network speed issues

Lo-Fi Prototypes

• 6 paper prototypes with 20 participants

Design Solutions

• Problem: how to start and what to do next• Solution: Suggest next actions

– Weak data typing to find types (addresses, numbers, etc)

– Filter operators to only show relevant ones

– Suggest operators that might be applicable

Design Solutions

• Problem: little feedback about state of system between operations

• Solution: link data flow and data view together– Many systems take program-centric view (ex. Automator)

or data-centric view (ex. spreadsheets)

– Use hybrid data flow / data view, showing an operation and its effects together

– Data view usually “spreadsheet”, other views possible too (for example, maps)

Design Solutions

• Problem: difficult to iterate due to network speeds• Solution: cache data, let people “replay” data

– Reload, pause, play

Other Design Findings

• Screen real estate issues– Collapsible operators, leaving a readable label

Extracting Generic Content

• Can’t have pre-defined extractor operators for every possible web site– Need a more general way of extracting data from pages

• Developed a generic wizard UI for selecting links– Content from that set could be extracted via other operators

– Uses Solvent (MIT), an XPath-based algorithm for finding patterns in web pages

• Finds “groups” of related web content based on how HTML is structured

Marmite

Operators

• Operators have input types – Operator uses this to guess which columns it wants

• Operators have output types

Implementation

• JavaScript (for underlying code) and Extensible Binding Language (XBL for UI)

• Operators currently in JavaScript– Ideally could be scriptable in any programming language

– Currently ~15 operators

Marmite

• Motivation and Examples• Features and Design Rationale• User Evaluation

Evaluation

• Informal user study with 6 people– 2 novices

– 2 people with spreadsheet experience (formulas)

– 2 people with programming experience

• Tasks (in increasing difficulty)– Warmup task showing how to retrieve a set of addresses

and how to geocode an address

– Search for and filter out events further than a week away

– Compile a list of events from two event services and plot them on a map

– Recreate the housingmaps site

Results

• Three people able to complete all tasks in ~1 hour– First two users confused about suggested actions

(automatically popped up, made manual for other 4 users)

– Novice made some progress, not able to finish all tasks

• Able to re-create housingmaps in ~15 minutes

Marmite

More Results

• Biggest barrier was understanding the data flow– Did not understand input and output concept

– Applied operators as one-off, did not realize that it was a static representation of flow

– Did not understand data flow and data view were linked

Future Directions

• Short-term– Better screen-scraping operators

– More operators

– Better connection with web services (WSDL and REST)

– Better help for starting a data flow

• Long-term– Intelligence analysis

– Better visualizations

– Location-based services

Conclusions

• Marmite, a tool for creating web-based mashups– Extract content from one or more web pages

– Process it in a data flow manner

– Direct the output to a variety of sinks

• Hybrid data flow / data view• User evaluation shows some promising results

Jeff Wong, Jason Hong, Making Mashups with Marmite: Re-purposing Web Content through End-User Programming, CHI 2007

Marmite

Types of Operators

• Sources– Add data into Marmite by querying databases, extracting

information from web pages, and so on.

• Processors– modify, combine, or delete existing rows. Example operators

include geocoding (converting street addresses to latitude and longitude) and filtering. Processor operators might add or remove columns as well

• Sinks– redirect the flow the data out of Marmite. Examples include

showing data on a map, saving it to a file, or to a web page.

top related