Community Profiling for Crowdsourcing Queries

COMMUNITY PROFILING FOR

CROWDSOURCING QUERIES

Khalid Belhajjame 1

Marco Brambilla 2

Daniela Grigori 1

Andrea Mauri 2

1 PSL, Paris-Dauphine University, LAMSADE, France2 Politecnico di Milano, Italy

Traditional vs Community Crowdsourcing

• General structure:

• the requestor poses some questions

• a wide set of responders are in charge of providing answers

(typically unknown to the requestor)

• the system organizes a response collection campaign

• Traditional Crowdsourcing

• Cost – Quality Tradeoff

• Complex results aggregation

• Community Crowdsourcing

• Matching the task to the “correct” group of workers

2Monday 15th September 2014 Community Profiling for Crowdsourcing Queries

Community

Set of people that share

• Interests

• Feature

..or belong to

• common entity

• social network


Leveraging communities

• Why?

• Experts

• More engaged

• How?

• Determine the communities of performers

• Target the correct community

• Monitor them taking into the account the behavior of their members


The approach

• Models

• Query Model

• Community Model

• Matching strategies

• Keyword based

• Semantic based


Query Model

• Textual description of the task

• Examples of query and responses

• Knowledge needed

• Prior knowledge (knowledge base) that can be used for

partially answering or for identifying potential answers.

• Type of the task

• Unary: tag, classify, like, …

• N-ary: match, cluster, …

• Objects

• Kind, description, text, metadata …

• Temporal


Community Model

• Textual description of the community

• name, web page, …

• Type of the community

• Explicit: Statically existing and consolidated

• Implicit: Dynamically built based on the need

• Definition

• Intentional: defined by a property

• Extensional: list of members

• Both

• Grouping factor

• Friendship, interest, location, expertise, affiliation


Community Model

• Content

• Produced by the people of the community

• Members’ profiles

• Explicit

• Implicit

• Communication channel

• Email, facebook, linkedin, twitter, blogs or web sites (reviews,

expert sites), AMT


Relations between Communities

• Subsumption

• A given community contains another community

• i.e. Sport fans contains Soccer fans

• Similarity

• Two communities refer to similar expertise or topic

• i.e. Experts in classical music and experts on opera


Matching

• Keyword Based

• Communities and query treated as bag of words

• Requires indexing

• Semantic Based

• Communities and query are mapped to concepts

• Requires semantic annotation


Community Control

Community control consists in the adaptation of the

crowdsourcing campaign according to the behavior of the

community

• Task / Object allocation (granularity)

• Static / Dynamic

SOCM’14, Monday, April 7


CrowdSearcherA prototype that allows

the definition,

execution and control

of a crowdsourcing

campaign


12

http://crowdsearcher.search-computing.org/

Monday 15th September 2014 Community Profiling for Crowdsourcing Queries

Example (dynamic control)

µTObjExecution

Performer

TaskImage

Statu

s

StartT

s

End

Ts

µTas

kID

Object Control

Performer Control

Task Control

Com

pObj

s

Task

ID

Com

pExe

cs

Name

PerformerID

Score

Pro

fPho

to

Mat

eria

ls

Non

Rel

evan

t

Peo

pleP

lace

TaskIDObjectID

Correct

ImgUrl

Cat

egor

y

ProfessorID

ObjectID

Answer

Eval

ObjectID

PerformerID

TaskID

PerformerID

CommunityCommunityID

Name

Community Control

CommunityID

Score

CommunityID

Enabled

Sta

tus

Sta

tus

Last

Exec

Tests

Execs


13

e: AFTER UPDATE FOR μTObjExecution

c: CommunityControl[CommunityID== NEW.CommunityID].score<=0.5

CommunityControl[CommunityID== NEW.CommunityID].eval=10

a: SET CommunityControl[CommunityID == DB-Group].Enabled = true

?


Experiment• 16 professors within two

research groups in ourdepartment (DB and AIgroups)

• The top 50 imagesreturned by the GoogleImage API for each query

• Each experts have toevaluate 5 images at time

• Results are acceptedwhen enough agreementon the class of the imageis reached

• Evaluated objects areremoved from newexecutions.



Communities

The communities:

• the research group of the professor,

• the research area containing the group (e.g. Computer Science)

• and the whole department (which accounts for more than 600 people in different areas)

Invitations are sent:

• inside-out: we started with invitations to experts, e.g. people the same groups as the professor (DB and AI), and then expanded invitations to Computer Science, then to the whole Department, and finally to open social networks (Alumni and PhDs communities on Facebook and Linkedin);

• outside-in: we proceeded in the opposite way, starting with the Department members, then restricting to Computer Scientists, and finally to the group's members.



Number of performers per community


Com

mu

nity-b

ased C

row

dso

urc

ing

16

0"

10"

20"

30"

40"

50"

60"

70"

7/18/2013" 7/19/2013" 7/20/2013" 7/21/2013" 7/22/2013" 7/23/2013" 7/24/2013" 7/25/2013" 7/26/2013" 7/27/2013" 7/28/2013"

#"Perform

ers"

Time"

research"group"

research"area"

department"

social"network"

total"

46%

24%

16%

9 / “a lot”


Precision of performers per community


17

0"

0.1"

0.2"

0.3"

0.4"

0.5"

0.6"

0.7"

0.8"

0.9"

1"

0" 500" 1000" 1500" 2000" 2500" 3000"

Precision)

#Evalua0ons)

research"group"

research"area"

department"

social"network"

total"


Precision of the evaluated objects

• Precision decreases with less expert communities

• Inside-out strategy (from expert to generic users) outperforms

Outside-in strategy (from generic to expert users)


18

0.6$

0.65$

0.7$

0.75$

0.8$

0.85$

0.9$

0.95$

1$

0$ 100$ 200$ 300$ 400$ 500$ 600$ 700$ 800$

Precision)

#Closed)Objects)

precision$(main$experiment)$

precision$(reverse$invita<ons)$


General observations

A given community of workers can be broken down into

(possibly overlapping) sub-communities with different

expertise

Experts from community feel more engaged with the task

• They are more demanding with respect to the quality of

the application UI and the evaluated objects

• Provide feedbacks on the application, question and the

objects evaluated• “How is it possible that this image is related to me?!”



Conclusions

• Communities can be effectively used for tasks that require

domain expertise

• How to deal with tasks requiring multiple expertise

• How to build a knowledge base that allows profiling of

both communities and queries in a optimal way

• How to cope with the dynamics over time of

• Communities and task (changing needs)

• Communities and worker expertise


Thanks for your attention

Any Question?


http://crowdsearcher.search-computing.org/

Contacts

Khalid Belhajjame [email protected]

Marco Brambilla [email protected]

Daniela Grigori [email protected]

Andrea Mauri [email protected]

References• Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Andrea Mauri, Riccardo Volonterio. 2014. Pattern-

Based Speci cation of Crowdsourcing Applications. In Proceedings of the 14th International Conference of Web Engineering (ICWE 2014), 218 - 235

• Marco Brambilla, Stefano Ceri, Andrea Mauri, Riccardo Volonterio. 2014. Community-based Crowdsourcing. In The 2nd International Workshop on the Theory and Practice of Social Machines. Proceedings of the 23nd International Conference on World Wide Web (Companion Volume).

• Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Andrea Mauri. 2013. Reactive Crowdsourcing. In Proceedings of the 22nd International Conference on World Wide Web (WWW 2013).

• Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Matteo Silvestri, Giuliano Vesci. 2013. Choosing the right crowd: expert finding in social networks. In Proceedings of 16th International Conference on Extending Database Technology (EDBT 2013). ACM, USA, 637-648.

• Alessandro Bozzon, Marco Brambilla, and Stefano Ceri. 2012. Answering search queries with CrowdSearcher. In Proceedings of the 21st international conference on World Wide Web (WWW '12). ACM, New York, NY, USA, 1009-1018.

• Alessandro Bozzon, Marco Brambilla, Andrea Mauri. 2012. A Model-Driven Approach for


Community Profiling for Crowdsourcing Queries

Social Media

community explicit

crowdsourcing campaign

crowdsourcing queriesrelations

taskexamples of query

web sites

task unary

similar expertise

web page