Top Banner
1 Information Management via CrowdSourcing Hector Garcia-Molina Stanford University
49

Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Oct 01, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

1

Information Management

via CrowdSourcing Hector Garcia-Molina

Stanford University

Page 2: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Crowdsourcing

2

Page 3: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Not to be confused with:

• Wisdom of the Crowd

• Cloud Computing :-)

3

Page 4: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Does figure show > 45 dots?

4

Question A

Page 5: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Does figure show > 45 dots?

5

Page 6: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Does figure show > 45 dots?

6

Report Results for Question A

Page 7: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Does figure show > 45 dots?

7

Question B

Page 8: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Does figure show > 45 dots?

8

Page 9: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Does figure show > 45 dots?

9

Report Results for Question B

Page 10: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Many Crowdsourcing Marketplaces!

Page 11: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Real World Examples

11

Image Matching

Translation

Categorizing Images

Search Relevance

Data Gathering

Page 12: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Many Research Projects!

12

Page 13: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

The Many Faces of Crowdsourcing

13

Page 14: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

The Many Faces of Crowdsourcing

14

Human-Computer Interaction

Software Systems

Machine Learning

Human Issues

Information Management

Page 15: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Crowd Information Management

• Two Aspects:

– Crowd as Information Source

– Crowd as Data Processor

15

Page 16: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

16

Fundamental Tradeoffs

Latency

Cost

Uncertainty

Page 17: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

17

Efficiency: Fundamental Tradeoffs

Latency

Cost

Uncertainty

How much $$ can I spend?

How long can I wait?

What is the desired quality?

Page 18: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

18

Efficiency: Fundamental Tradeoffs

Latency

Cost

Uncertainty

How much $$ can I spend?

How long can I wait?

What is the desired quality?

• Which questions do I ask humans?

• Do I ask in sequence or in parallel?

• How much redundancy in questions?

• How do I combine the answers?

• When do I stop?

Page 19: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Example: CrowdScreen

19

Dataset of Items Predicate

Y Y N

Item X satisfies predicate?

Filtered Dataset

Page 20: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Strategy

20

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

Page 21: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Strategy

21

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

continue

decide PASS

decide FAIL

Page 22: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Strategy

22

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

decision point

Page 23: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

More Examples

23

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

Page 24: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

More Examples

24

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

Page 25: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

More Examples

25

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

Page 26: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

More Examples

26

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

Page 27: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Some Optimizations

27

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

Page 28: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

What is “best” strategy?

28

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

Page 29: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

What is “best” strategy?

29

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

p(x,y) = probability of miss-classification at x,y

Expected error:

∑ p(x,y)*end(x,y)

end(x,y)= probability of terminating at x,y

Expected cost:

∑ (x+y)*end(x,y)

Page 30: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

One (of many) optimization problems:

30

6 5 4 3 2 1

6

5

4

3

2

1

NOs

YESs

Find strategy that minimizes expected cost (# questions),

such that expected error is less than threshold

(and number of questions never exceeds m).

Page 31: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Example of Results

31

Page 32: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Beyond Single Filter

• Probabilistic Strategies

• Multiple Filters

• Categorizer (output more than 2 types)

32

Page 33: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Beyond Filtering

• Finding Max

• Sorting

• Clustering

• Entity Resolution

• Adding terms to a taxonomy

• Building a Folksonomy

• ...

33

Page 34: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Beyond Simple Models

• Worker Error Models

• Task Design

• Tracking Worker Abilities

• Payments

• Response Time Issues

• ...

34

Page 35: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

35

Crowd As Information Source

DBMS like thing

Declarative queries

Web

Page 36: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

36

The Deco Data Model

RDBMS

Actual schema

Conceptual schema

Schema designer

relations and other stuff

End user

relations

automatic (system)

Page 37: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

restaurant rating cuisine

Chez Panisse 4.9 French

Chez Panisse 4.9 California

Bytes 3.8 California

• • • • • • • • •

Small Example

User view

Page 38: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

restaurant rating cuisine

Chez Panisse 4.9 French

Chez Panisse 4.9 California

Bytes 3.8 California

• • • • • • • • •

⋈ o

Small Example

38

User view

restaurant

Chez Panisse

Bytes

• • •

restaurant rating

Chez Panisse 4.8

Chez Panisse 5.0

Chez Panisse 4.9

Bytes 3.6

Bytes 4.0

• • • • • •

restaurant cuisine

Chez Panisse French

Chez Panisse California

Bytes California

Bytes California

• • • • • •

• • • • • •

Anchor

Dependent Dependent

Page 39: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

restaurant rating cuisine

Chez Panisse 4.9 French

Chez Panisse 4.9 California

Bytes 3.8 California

• • • • • • • • •

⋈ o

Small Example

39

User view

restaurant

Chez Panisse

Bytes

• • •

restaurant rating

Chez Panisse 4.8

Chez Panisse 5.0

Chez Panisse 4.9

Bytes 3.6

Bytes 4.0

• • • • • •

restaurant cuisine

Chez Panisse French

Chez Panisse California

Bytes California

Bytes California

• • • • • •

• • • • • •

Anchor

Dependent Dependent

fetch rule

fetch rule Bytes

Chez Panisse

fetch rule

Page 40: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

restaurant rating cuisine

Chez Panisse 4.9 French

Chez Panisse 4.9 California

Bytes 3.8 California

• • • • • • • • •

⋈ o

Small Example

40

User view

restaurant

Chez Panisse

Bytes

• • •

restaurant rating

Chez Panisse 4.8

Chez Panisse 5.0

Chez Panisse 4.9

Bytes 3.6

Bytes 4.0

• • • • • •

restaurant cuisine

Chez Panisse French

Chez Panisse California

Bytes California

Bytes California

• • • • • •

• • • • • •

Anchor

Dependent Dependent

fetch rule

fetch rule Bytes

Chez Panisse

fetch rule fetch rule

French

Page 41: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

restaurant rating cuisine

Chez Panisse 4.9 French

Chez Panisse 4.9 California

Bytes 3.8 California

• • • • • • • • •

⋈ o

Small Example

41

User view

restaurant

Chez Panisse

Bytes

• • •

restaurant rating

Chez Panisse 4.8

Chez Panisse 5.0

Chez Panisse 4.9

Bytes 3.6

Bytes 4.0

• • • • • •

restaurant cuisine

Chez Panisse French

Chez Panisse California

Bytes California

Bytes California

• • • • • •

• • • • • •

Anchor

Dependent Dependent

resolution rule

resolution rule

Bytes

Chez Panisse

Page 42: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

restaurant rating cuisine

Chez Panisse 4.9 French

Chez Panisse 4.9 California

Bytes 3.8 California

• • • • • • • • •

⋈ o

Small Example

42

User view

restaurant

Chez Panisse

Bytes

• • •

restaurant rating

Chez Panisse 4.8

Chez Panisse 5.0

Chez Panisse 4.9

Bytes 3.6

Bytes 4.0

• • • • • •

restaurant cuisine

Chez Panisse French

Chez Panisse California

Bytes California

Bytes California

• • • • • •

• • • • • •

Anchor

Dependent Dependent

1. Fetch

2. Resolve

3. Join

Page 43: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Fetch

[n]

Fetch

[ln]

Fetch

[ln,c]

Scan

D1(n,l)

Scan

A(n) 43

Join

Join

AtLeast [8] SELECT n,l,c

FROM country

WHERE l = ‘Spanish’

ATLEAST 8

Resolve[m3] Resolve[d.e]

Fetch

[nl]

Scan

D2(n,c)

Resolve[m3]

Fetch

[nl,c]

Many Query Processing Challenges

Filter [l=‘Spanish’]

Fetch

[nl,c]

Page 44: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Deco Prototype V1.0

44

Page 45: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Experimental Setup

• Experimental Goals:

– Different fetch configurations

• Basic: n + nl + nc

• Reverse: ln + nl + nc

• Hybrid: ln,c + nl,c

– Different filter locations

• After vs. between joins

• Experimental Setup:

– 5 cents/task on MTurk

– Empty tables initially

– Default: reverse + between

SELECT n,l,c FROM country WHERE l = ‘Spanish’ ATLEAST 8

45

Page 46: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Experiment 1: Different Fetch Rules

46

Page 47: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Experiment 1: Different Fetch Rules

47

$12.00 and 2 hrs

$2.20 and 15 min

$1.30 and 11 min

Page 48: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

Conclusion

• Crowdsourcing is exciting area!

• Many challenges!

48

Page 49: Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd •Cloud Computing :-) 3 . Does figure show > 45 dots? 4 Question A . Does figure show

49