Top Banner
CROWDSOURCING AND ITS APPLICATIONS ON SCIENTIFIC RESEARCH
61

Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

CROWDSOURCING AND ITS APPLICATIONS ON SCIENTIFIC RESEARCH

Page 2: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Crowdsourcing= Crowd + Outsourcing

“soliciting solutions via open calls to large-scale

communities”

Page 3: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”
Page 4: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Some Examples

Call for professional helps Award 50,000 to 1,000,000 for each tasks

Office work platform

Microtask platform Over 30,000 tasks at the same time

Page 5: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

What Tasks are crowdsourceable?

Page 6: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Software Development

Reward: 25,000 USD

Page 7: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Data Entry

Reward: 4.4 USD/hour

Page 8: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Image Tagging

Reward: 0.04 USD

Page 9: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Trip Advice

Reward: points on Yahoo! Answers

Page 10: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

The impact of crowdsourcing on scientific research?

Page 11: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Amazon Mechanical Turk

A micro-task marketplace Task prices are usually between 0.01

to 1 USD Easy-to-use interface

Page 12: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Amazon Mechanical Turk

Human Intelligence Task (HIT) Tasks hard for computers

Developer Prepay the money Publish HITs Get results

Worker Complete the HITs Get paid

Page 13: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Who are the workers?

Page 14: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

A Survey of Mechanical Turk Survey on 1000 Turkers (Turk

workers) Two identical surveys (Oct. 2008 and

Dec. 2008) Consistent results Blog post:

A Computer Scientist in a Business School

Page 15: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

美國77%

印度5%

菲律賓3%

加拿大3%

英國2%

德國1%

義大利0%

其他9%

來源國家

Page 16: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Education

AgeGender

Annual Income

Page 17: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Compare with Internet Demographics Use the data from ComScore

In summary, Tukers are younger

Portion of 21-35 years old: 51% vs. 22% in internet mainly female

70% female vs. 50 % female having lower income

65% turkers with income < 60k/year vs. 45% in internet

having smaller family 55% turkers have no children vs. 40% in internet

Page 18: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

How Much Turkers Earn?

Page 19: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Why Turkers Turk?

Page 20: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Research Applications

Page 21: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Dataset Collection

Dataset is important in computer science!

In multimedia analysis Is there X in the image Where is Y in the image

In natural language processing What is the emotion of this sentence

And in lots of other applications

Page 22: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Dataset Collection

Utility Annotation By Sorokin and Forsyth at UIUC Image analysis

Type keyword Select examples Click on landmarks Outline figures

Page 23: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

0.01 USD/ task

Page 24: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

0.02 USD/ task

Page 25: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

0.01 USD/ task

Page 26: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

0.01 USD/ task

Page 27: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Dataset Collection

Linguistic annotations (Snow et al. 2008) Word similarity

USD 0.2 to label 30 word pairs

Page 28: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Dataset Collection

Linguistic annotations (Snow et al. 2008) Affect recognition

USD 0.4 to label 20 headlines (140 labels)

Page 29: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Dataset Collection

Linguistic annotations (Snow et al. 2008) Textual entailment

If “Microsoft was established in Italy in 1985”, then “Microsoft was established in 1985” ?

Word sense disambiguation “a bass on the line” vs. “a funky bass line”

Temporal annotation Ran happens before fell:

“The horse ran past the barn fekk”

Page 30: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Dataset Collection

Document relevance evaluation Alonso et al. (2008)

User rating collection Kittur et al. (2008)

Noun compound paraphrasing Nakov (2008)

Name resoluation Su et al. (2007)

Page 31: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Data Characteristic

Cost? Efficiency? Quality?

Page 32: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Cost and Efficiency

In image annotation Sorokin and Forsyth, 2008

Page 33: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Cost and Efficiency

In linguistic annotation Snow et. al, 2008

Page 34: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Cheap and fast!

Is it good?

Page 35: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Quality

Multiple non-experts can beat experts 三個臭皮匠勝過一個諸葛亮 Black line

agreement among turkers

Green line: single expert

Golden result: agreement among

multiple experts

Page 36: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

In addition toDataset Collection

Page 37: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

QoE Measurement

QoE (Quality of Experience) Subjective measure of user perception

Traditional approach User studies by MOS ratings (Bad ->

Excellent) Crowdsourcing with paired

comparison Diverse user input Easy to understand Interval scale scores can be calculated

Page 38: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Acoustic QoE Evaluation

Page 39: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Acoustic QoE Evaluation

Which one is better? Simple pair comparison

Page 40: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Optical QoE evaluation

Page 41: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Interactive QoE Evaluation

Page 42: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Acoustic QoE

MP3 Compression Rate

VoIP Loss Rate

Page 43: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Optical QoE

Video Codec Packet loss rate

Page 44: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Iterative Task

Page 45: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Iterative Tasks

Turkit: tools for iterative tasks on Mturk

Imperative programming paradigm Basic elements

Variable (a = b) Control (if else statement) Loop (for, while statement)

Turning MTurk into a programming platform which integrates human brain powers

Page 46: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Iterative Text Improvement A Wikipedia-like scenario

One Turker improve the text Other Turkers vote if the improvement is

valid

Page 47: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Iterative Text Improvement Image description

Instructions for the improve-HIT Please improve the description for this image People will vote whether to approve your

changes Use no more than 500 characters

Instructions for the vote-HIT Please select the better description for this

image Your vote must agree with the majority to be

approved

Page 48: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Iterative Text Improvement Image description A partial view of a pocket calculator

together with some coins and a pen.

A view of personal items a calculator, and some gold and copper coins, and a round tip pen, these are all pocket and wallet sized item used for business, writing, calculating prices or solving math problems and purchasing items.

A close-up photograph of the following items:* A CASIO multi-function calculator* A ball point pen, uncapped* Various coins, apparently European, both copper and gold

…Various British coins; two of £1 value, three of 20p value and one of 1p value. …

Page 49: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Iterative Text Improvement Image descriptionA close-up photograph of the following

items:

A CASIO multi-function, solar powered scientific calculator.

A blue ball point pen with a blue rubber grip and the tip extended.

Six British coins; two of £1 value, three of 20p value and one of 1p value.

Seems to be a theme illustration for a brochure or document cover treating finance - probably personal finance.

Page 50: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Iterative Text Improvement Handwriting Recognition

Version 1 You (?) (?) (?) (work). (?) (?) (?) work (not)

(time). I (?) (?) a few grammatical mistakes. Overall your writing style is a bit too (phoney). You do (?) have good (points), but they got lost amidst the (writing). (signature)

Page 51: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Iterative Text Improvement Handwriting Recognition

Version 6 “You (misspelled) (several) (words). Please spell-

check your work next time. I also notice a few grammatical mistakes. Overall your writing style is a bit too phoney. You do make some good (points), but they got lost amidst the (writing). (signature)”

Page 52: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Cost and Efficiency

Page 53: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

More on Methodology

Page 54: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Repeated Labeling

Crowdsourcing -> Multiple imperfect labeler Each worker is a labeler Labels are not always correct

Repeated labeling Improve the supervised induction

Increase the single-label accuracy Decrease the cost for acquiring training data

Page 55: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Repeated Labeling

Repeated labeling helps improve the overall quality when the accuracy of single labeler low.

Page 56: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Selected Repeated Labeling Repeat-label the most uncertain points

Label uncertainty (LU) Whether the label distribution is stable Calculated from beta distribution

Model uncertainty (MU) Whether the model has high confidence

for the label Calculated from model predictions

Page 57: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Selected Repeated Labeling Selected repeated labeling improves

the overall quality of crowdsourcing approach.

GRR: no selected repeated labelingMU: Model UncertaintyLU: Label UncertaintyLMU: integrate Label and Model Uncertainty

Page 58: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Incentive vs. Performance

High financial incentive -> high performance?

User studies (Mason and Watt 2009) Order images

Ex: choose the busiest image

Solve word puzzles

Page 59: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Incentive vs. Performance

High incentive -> high quantity, not high quality

Page 60: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Incentive vs. Performance

Workers always wants moreH

ow

much

work

ers

thin

k th

ey d

ese

rve

Users would be influenced by their paid amount

Pay little at first, and incrementally increase the payment

Page 61: Crowdsourcing = Crowd + Outsourcing “soliciting solutions via open calls to large-scale communities”

Conclusion

Crowdsourcing provides a new paradigm and a new platform for computer science researches.

New applications, new methodologies, and new businesses are quickly developing with the aid of crowdsouring.