CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Juho Kim Phu Nguyen

Sarah Weir Philip J. Guo

Robert C. Miller Krzysztof Z. Gajos

Crowdsourcing Step-by-Step

Information Extraction to

Enhance Existing How-to

Videos

how-to videos

online

learning from how-to videos:

limited by video player interfaces

Watching Example

Problem in Watching

It’s difficult to navigate to

specific parts you’re interested in.

Problem in Watching

It’s difficult to navigate to

specific parts you’re interested in.

find

repeat

skip

How-to Video: Step-by-Step

Nature

Apply

gradient map

Completeness & detail of step-by-step instructions are

integral to task performance.Eiriksdottir and Catrambone, 2011

Proactive & random access, semantic indices in

instructional videos: better task performance and learner

satisfactionZhang et al., 2006

Interactivity can help overcome the difficulties of

perception and comprehension. Stopping, starting and

replaying an animation can allow reinspection.Tversky et al., 2002

Design InsightEnable step-by-step navigation with high interactivity

ToolScape: Step-aware video player

work in progress

images

parts with no

visual progress

step labels & links

enhance existing how-to videos with

step-level interactivity & annotation

Research Questions

Does step-by-step navigation help learners?

Preliminary user study

How can we annotate an existing how-to

video with step-by-step information?

Crowdsourcing annotation workflow

Research Questions






Study: Photoshop Design Tasks

12 novice Photoshop users

manually annotated videos

Baseline ToolScape

With ToolScape, learners will…

H1. feel more confident about their design skills.

- self-efficacy gain

H2. believe they produced better designs.

- self-rating on designs produced

H3. actually produce better designs.

- external rating on designs produced

H1. Higher self-efficacy gain with ToolScape– Four 7-Likert scale questions

– Mann-Whitney’s U test (Z=2.06, p<0.05), error bar: standard error

1.4

0 1 2 3 4 5 6 7

ToolScape

Baseline 0.13.8

3.8

H2. Higher self-rating with ToolScape– One 7-Likert scale question

– Mann-Whitney’s U test (Z=2.70, p<0.01), error bar: standard error

5.3

3.5

0 1 2 3 4 5 6 7

ToolScape

Baseline

H3. External raters rank ToolScape designs higher.– (Ranking: Lower is better)

– Wilcoxon Signed-rank test (W=317, Z=-2.79, p<0.01, r=0.29) , error bar: standard error

– Krippendorff’s alpha = 0.753

5.7

7.3

0 2 4 6 8 10 12

ToolScape

Baseline

Non-sequentially navigating

videoStep-level navigation: clicked 8.9 times per task

“It is great for skipping straight to relevant

portions of the tutorial.”

“It was also easier to go back to parts I missed.”

Research Questions






Annotations for Step-Aware Video

Player

• step time

• step label

• before/after results

Design Goals for Annotation

Method• domain-independent

• existing videos

• untrained annotators

Crowdsourcing

Multi-stage crowdsourcing

workflow

When & What are

the steps?

Vote & Improve

Before/After the steps?

FIND VERIFY EXPAND

When & What are

the steps?

Vote & Improve


FIND VERIFY EXPAND

Input video

When & What are

the steps?

Vote & Improve


FIND VERIFY EXPAND

Input video

When & What are

the steps?

Vote & Improve


FIND VERIFY EXPAND

Input video

When & What are

the steps?

Vote & Improve


FIND VERIFY EXPAND

Input video

When & What are

the steps?

Vote & Improve


FIND VERIFY EXPAND

Input video

Output timeline

Stage 1. FIND candidate steps

Labeling a step

Time-based Clustering

Stage 2. VERIFY steps by

voting/improving

Quality control for Stage 2

• Majority voting

• Breaking ties

– String matching to combine

“similar enough” labels

– Longer string

“grate three cups of cheese” > “grate cheese”

Stage 3.

EXPAND with

before/after

images

Quality control for Stage 3

• Majority voting

• Breaking ties:

– Pixel diff to combine

“similar enough” frames

– Choose what’s closer to the step

Evaluation

• Generalizable?

75 Photoshop / Cooking / Makeup videos

• Accurate?

precision and recall

against trained annotators’ labels

Across all domains,

~80% precision and recall

Domain Precision Recall

Cooking 0.77 0.84

Makeup 0.74 0.77

Photoshop 0.79 0.79

All 0.77 0.81

Conceptual Level Differences

• “Now apply the bronzer to your face

evenly”

• “Apply the bronzer to the forehead”

• “Apply the bronzer to the cheekbones”

• “Apply the bronzer to the jawline”

Timing is 2.7 seconds off on

average

Ground truth: one step every 17.3 seconds

2.7 seconds

Cost: $1.07 per minute of video

• 111 HITs / video (3 workers / task)

• $2.50 / video (Find + Verify)

• $4.85 / video (Find + Verify + Expand)

• $0.32 / step (time + label + before/after)

Contributions

• Study: increased interactivity improved task performance & self-efficacy

• Crowd video annotation method & Find-Verify-Expand design pattern

• Evaluation: fully extracted 75 existing videos across 3 domains, 80% accuracy

hierarchical solution structure extraction

Catrambone, R. The subgoal learning model: Creating better examples so that

students can solve novel problems. Journal of Experimental Psychology: General, 127, (1998).

Ongoing Work: Beyond low-level

steps

hierarchical solution structure extraction

Ongoing Work: Beyond low-level

steps

Learnersourcing: learners as a crowd

• Motivated, qualified

• Feedback loop between learners & system

Future of How-to Video

Learning

What if we had 1000s of

fully annotated videos?

• Flexible learning paths with multiple videos

• Step-level search, recommendation

• Patterns from multiple solutions

Crowdsourcing Step-by-Step Information Extraction

to

Enhance Existing How-to Videos

Juho Kim

MIT CSAIL

[email protected]

juhokim.com

Acknowledgement: This work was supported in part by

Quanta Computer & the Samsung Fellowship.

CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Education

CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos