Top Banner
Crowdsourcing beyond Kuan-Ta Chen Institute of Information Science Academia Sinica Building Crowdmining Services for Your Own Research CrowdKDD’12 Aug 12, 2012
66

Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

Sep 14, 2014

Download

Education

The keynote talk at CrowdKDD 2012 http://www.cse.ust.hk/~nliu/crowdkdd12/
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

Crowdsourcing beyond …

Kuan-Ta Chen

Institute of Information Science Academia Sinica

Building Crowdmining Services for Your Own Research

CrowdKDD’12 Aug 12, 2012

Page 2: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

What I’m going to talk

Crowdsourcing?

Crowdsourcing + Data Mining Research?

Common Fallacies of CS4DM Research

Pomics: A Crowdmining Service

Conclusion

Page 3: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 3

Crowdsourcing = Crowd + Outsourcing

“soliciting solutions via open calls to large-scale communities”

Page 4: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 4

A more formal definition

“Crowdsourcing is the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call.” [1]

[1] Howe, Jeff. Crowdsourcing: A Definition, http://crowdsourcing.typepad.com/

Page 5: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 5

What Can Crowdsourcing Do?

Page 6: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 7

Brand Tagging

Page 7: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 8

Data Entry

Reward: 4.4 USD/hour

Page 8: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 9

General Questions

Reward: points on Yahoo! Answers

Page 9: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 10

When crowdsourcing meets data mining…

Crowdsourcing Data mining

What’s in here?

Page 10: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 11

Crowdsourcing for Data Mining: Issues

Purposes Annotation (ground-truth generation) Evaluation Retrieval Human-in-the-loop computation

Methodologies Recruiting Incentives Task Design Workflow Learning from crowd Quality control Cheat detection

Page 11: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 12

Crowdsourcing Uses in Data Mining Research

Page 12: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 13

Image Semantics

Reward: 0.04 USD / task

main theme? key objects?

unique attributes?

Page 13: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 14

0.02 USD/ task

find out photos of revolvers!

Page 14: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 15

0.01 USD/ task

Human Skeleton

Page 15: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 17

0.01 USD/ task

Photo Orientation

Page 16: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 18

Perspectives for 3D Objects

Thi Phuong Nghiem, Axel Carlier, Geraldine Morin, and Vincent Charvillat, "Enhancing online 3D products through crowdsourcing," ACM CrowdMM'12.

Page 17: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 19

Web Site Classifier

12 USD / hour Panos Ipeirotis, “Crowdsourcing using Mechanical Turk: Quality Management and Scalability,” Invited Talk at CSDM 2011.

Page 18: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 20

Photographers’ Intention to support a task? to capture a bad feeling? to preserve a good feeling? to recall later on? to publish it online? to show it to friends and family?

Mathias Lux, Mario Taschwer, and Oge Marques, “A Closer Look at Photographers’ Intentions: a Test Dataset,” ACM CrowdMM’12.

Page 19: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 22

Linguistic Affective Judgement

Affective response (Snow et al. 2008)

USD 0.4 to label 20 headlines (140 labels)

“Closing and cancellations top advice on flu outbreak”

Page 20: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 24

A Lot More Examples

Document relevance evaluation Alonso et al. (2008)

Document rating collection Kittur et al. (2008)

Noun compound paraphrasing Nakov (2008)

Person name resolution Su et al. (2007)

And so on...

Page 21: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 25

THE COMMON FALLACIES -- EXPERIENCES FROM CROWDMM’12

Thanks to CrowdMM’12 co-organizers: Wei-Tsang Ooi, Martha Larson, and Wei-Ta Chu; also thanks to “Crowdsourcing for Multimedia” SI co-guest-editors Paul Bennent and Matt Lease.

Page 22: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 26

Common Fallacies #1

Crowdsourcing is NOT JUST conducting user studies

Crowd is uncontrollable with tasks performed in uncontrolled conditions

How to manage the crowd?

Page 23: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 27

Common Fallacies #2

Crowdsourcing is NOT JUST analyzing user-generated content

Cope with the noise in UGC rather than only the information.

How to manage the imperfectness & diversity in UGC?

Page 24: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 28

Common Fallacies #2

Crowdsourcing is NOT JUST analyzing user-generated content

Put the task element in the loop

Re-purposing the creation of UGC as your own microtasks

Page 25: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 29

Common Fallacies #3

Crowdsourcing is NOT JUST posting tasks on Mechanical Turk

Explicit Crowdsourcing Implicit Crowdsourcing

Piggyback Crowdsourcing

Doan et al, "Crowdsourcing systems on the World-Wide Web," CACM, vol 54, no 4, 2011.

Page 26: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

An implicit crowdmining platform for multimedia content

Page 27: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 31

Crowdsourcing for Data Mining: Issues

Purposes Annotation (ground-truth generation) Evaluation Retrieval Human-in-the-loop computation

Methodologies Recruiting Incentives Task Design Workflow Learning from crowd Quality control Cheat detection

Page 28: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 32

The Era of Too Many Photos People today use pictures to write down their daily experience (with the prevalence of digital cameras)

Page 29: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

How to Share Photos?

Page 30: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 34

3 Common Ways

Photo browsing Photo/video slideshow Illustrated text

Page 31: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 35

Photo Browsing

Page 32: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 36

Photo/Video slideshow

Page 33: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 37

Illustrated Text

Page 34: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 38

A MISSING PIECE

Page 35: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 39

Comics

Page 36: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 40

Photo Comics – Baby Born

Page 37: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 41

Photo Comics – Birthday Party

Page 38: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 42

Photo Comics – Daily Fun

Page 39: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 43

Media Comparison Creation

Cost Viewer

Req. Viewer Control

Richness Port-

ability

Photo browsing

Low Low High Low Low

Slideshow Medium Low Low Medium Low

IllustratedText

High High High High High

Comic High Low High High High How to lower it?

Page 40: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 44

Comic Making – Cartoonist’s Way

Page 41: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

http://www.pomics.net

Page 42: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

Goal of Pomics

Page 43: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 47

Pomics = Picture to Comics

47

Page 44: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 48

Computer-Aided Storytelling

Picture

Location Timing Analysis

Aesthetics Analysis Semantics Analysis

User Preference

Own rating Popularity

Auto Storytelling

Automated

Adjustment

Machine Learning

Draft Story

User Editing

Final Story

Page 45: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 49

Technical Challenges #1

Semantics Analysis Human recognition Emotion recognition Behavior recognition Object recognition Location identification Natural language processing

Aesthetics Analysis Exposure Composition

Timing Analysis Contextual Analysis

49

Page 46: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 50

Technical Challenges #2 Automatic Storytelling Significant photo selection Paginating and page layouting Narrative design

Page 47: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 51

Publish & share

Pomics as a Social Service

Web albums

Web resources

Page 48: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 52

Live Demo

52

Page 49: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 55

HOW IS RELATED TO CROWDSOURCING?

Page 50: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 56

USERS ARE IMPLICITLY DOING IMAGE ANNOTATION AND EVALUATION

Page 51: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 57

What pictures are used?

Why the 3 pictures were used?

Aesthetics information

Page 52: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 58

Wizard Interface

Aesthetics information

Page 53: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 59

The Page Layout

Semantics

Saliency info

Page 54: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 60

Usage Statistics of Pomics (since July 15 2012)

352 authors 434 comic books

4,362 frames 4,332 images used 1,057 image annotations 3,789 text balloons

3000+ shares on Facebook

Page 55: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 61

WHAT WE HAVE GATHERED SO FAR?

Page 56: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 62

Picture Aesthetics Info

Page 57: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 63

Picture Aesthetics (cont.)

Page 58: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 64

Picture Saliency Info

Page 59: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 65

Picture Semantics

Love / Like / Dear Happy Sleepy / sleeping Tears Wearing a hat NO!

Page 60: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 66

Can Pomics Do Micro-tasks?

The answer is YES! Users were asked to create comics using a specific album Rewarded by 200 MB quota if their books are “shared” by 20+ FB users

Page 61: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 67

Picture Aesthetics from Microtasks

Page 62: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 68

Picture Saliency from Microtasks

Page 63: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 69

Crowdmining Services

Advantages No or little hiring cost once right incentives are given Easily scale up Can change the game rules to fit to research

Disadvantages

High development cost Less flexible Hard to find the right incentives (besides money)

Page 64: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

Conclusion

Crowdmining is a potential and exciting area Crowdsourcing != Mechanical Turking A lot more can be done with crowdmining services

Building your own crowdmining service

today!

Page 65: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 71

CrowdMM 2012

Keynote: Prof. Masataka Goto (AIST, Japan) 11 oral+poster presentations

Annotation, Evaluation, Novel applications

An industrial panel discussion Welcome to join us!

(in conjunction with ACM Multimedia 2012)

http://crowdmm.org/

Page 66: Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research

Kuan-Ta Chen Academia Sinica

Unleash the power of

Crowd!

Thank You!

http://www.iis.sinica.edu.tw/~swc