Top Banner
Known-Item Search Matthias Hagen Bauhaus-Universit¨ at Weimar [email protected] @matthias_hagen B-S-S Anniversary Eisenach September 16, 2015 Matthias Hagen Known-Item Search 1
46

Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Apr 14, 2017

Download

Data & Analytics

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Known-Item Search

Matthias Hagen

Bauhaus-Universitat [email protected]

@matthias_hagen

B-S-S AnniversaryEisenach

September 16, 2015

Matthias Hagen Known-Item Search 1

Page 2: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

The scenario

Matthias Hagen Known-Item Search 2

Page 3: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

This is not just a problem of philosoraptor!

Matthias Hagen Known-Item Search 3

Page 4: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Known-item search

Re-finding previouslyseen/heard items like

Documents

Websites

Emails

Tweets

Movies

Music

Books

TV

Remarks: Users have some knowledge about their need.Only very few relevant documents out there.

Matthias Hagen Known-Item Search 4

Page 5: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Known-item search

Re-finding previouslyseen/heard items like

Documents

Websites

Emails

Tweets

Movies

Music

Books

TV

Remarks: Users have some knowledge about their need.Only very few relevant documents out there.

Matthias Hagen Known-Item Search 4

Page 6: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Problem

How do users search for known items?

Matthias Hagen Known-Item Search 5

Page 7: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Studies on re-finding known items

Web search [Sadeghi et al., ECIR 2015]

[Tyler and Teevan, WSDM 2010]

[Edar at al., CHI 2008]

[Azzopardi et al., SIGIR 2007]

[Teevan, TOIS 2008, UIST 2007]

[Beitzel et al., SIGIR 2003]

Twitter search [Meier and Elsweiler, IIiX 2014]

Email search [Elsweiler et al., SIGIR 2011, ECIR 2011, TOIS 2008]

PIM [Kim and Croft, SIGIR 2010, CIKM 2009]

[Kelly et al., IIiX 2008]

[Blanc-Brude and Scapin, IUI 2007]

[Boardman and Sasse, CHI 2004]

[Dumais et al., SIGIR 2003]

[Barreau and Nardi, SIGCHI Bulletin 1995]

Problem: Most corpora and queries not freely available.

Matthias Hagen Known-Item Search 6

Page 8: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Studies on re-finding known items

Web search [Sadeghi et al., ECIR 2015]

[Tyler and Teevan, WSDM 2010]

[Edar at al., CHI 2008]

[Azzopardi et al., SIGIR 2007]

[Teevan, TOIS 2008, UIST 2007]

[Beitzel et al., SIGIR 2003]

Twitter search [Meier and Elsweiler, IIiX 2014]

Email search [Elsweiler et al., SIGIR 2011, ECIR 2011, TOIS 2008]

PIM [Kim and Croft, SIGIR 2010, CIKM 2009]

[Kelly et al., IIiX 2008]

[Blanc-Brude and Scapin, IUI 2007]

[Boardman and Sasse, CHI 2004]

[Dumais et al., SIGIR 2003]

[Barreau and Nardi, SIGCHI Bulletin 1995]

Problem: Most corpora and queries not freely available.

Matthias Hagen Known-Item Search 6

Page 9: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Exceptions: Known-item query generation

Automatic extraction1 Select some document

2 Draw most discriminative terms

3 Add random noise

Web [Azzopardi et al., SIGIR 2007]

PIM [Kim and Croft, CIKM 2009]

Email [Elsweiler et al., SIGIR 2011]

Human computation game1 Select some document

2 Show it to a user for some time

3 Ask for a query retrieving ittop-ranked

PIM [Kim and Croft, SIGIR 2010]

Problem: Not really “natural” settings.

Matthias Hagen Known-Item Search 7

Page 10: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Exceptions: Known-item query generation

Automatic extraction1 Select some document

2 Draw most discriminative terms

3 Add random noise

Web [Azzopardi et al., SIGIR 2007]

PIM [Kim and Croft, CIKM 2009]

Email [Elsweiler et al., SIGIR 2011]

Human computation game1 Select some document

2 Show it to a user for some time

3 Ask for a query retrieving ittop-ranked

PIM [Kim and Croft, SIGIR 2010]

Problem: Not really “natural” settings.

Matthias Hagen Known-Item Search 7

Page 11: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Human memory: Not perfect but also not random

Matthias Hagen Known-Item Search 8

Page 12: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Reasons for memory failure?

Psychology!

Matthias Hagen Known-Item Search 9

Page 13: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Reasons for memory failure? Psychology!

Matthias Hagen Known-Item Search 9

Page 14: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Our goal

A large corpus of difficult and realistic known-item needs.

Remark: Freely available!

Matthias Hagen Known-Item Search 10

Page 15: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Our goal

A large corpus of difficult and realistic known-item needs.

Remark: Freely available!

Matthias Hagen Known-Item Search 10

Page 16: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

The general idea [Hauff et al., IIiX 2012]

1 Fetch known-item questions from Yahoo! Answers

To ensure realistic human information needsWebsites, movies, music, books, TV series

2 Link questions to a large static web crawl

Environment for repeatable researchClueWeb09 chosen

3 Construct queries from questions

Maybe via crowdsourcingNot part of this paper

Matthias Hagen Known-Item Search 11

Page 17: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Question acquisition

Querying Yahoo! Answers API:

forgot AND name AND film

forgot AND title AND song

remember AND title AND movie

forgot AND url AND (website OR (web site))

(remember OR forgot) AND (name OR title) AND book

37 such queries in total

24,765 answered questions returned

Problems: Not all questions are really “answered.”Not all questions are known-item intents.Not all questions are linkable to the ClueWeb09.

Matthias Hagen Known-Item Search 12

Page 18: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Question acquisition

Querying Yahoo! Answers API:

forgot AND name AND film

forgot AND title AND song

remember AND title AND movie

forgot AND url AND (website OR (web site))

(remember OR forgot) AND (name OR title) AND book

37 such queries in total

24,765 answered questions returned

Problems: Not all questions are really “answered.”Not all questions are known-item intents.Not all questions are linkable to the ClueWeb09.

Matthias Hagen Known-Item Search 12

Page 19: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Corpus cleansing

Answered status

Keep when best answer selected by asker

8,825 questions remain (only about 36% of original crawl)

Known-item status and ClueWeb linkage need manual assessment

Two independent annotators

About 400 hours of work

3,406 questions with known-item information need

2,755 can be linked to ClueWeb09 documents

Only these form our dataset

Problem: Hardly any website questions remained.

Matthias Hagen Known-Item Search 13

Page 20: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Corpus cleansing

Answered status

Keep when best answer selected by asker

8,825 questions remain (only about 36% of original crawl)

Known-item status and ClueWeb linkage need manual assessment

Two independent annotators

About 400 hours of work

3,406 questions with known-item information need

2,755 can be linked to ClueWeb09 documents

Only these form our dataset

Problem: Hardly any website questions remained.

Matthias Hagen Known-Item Search 13

Page 21: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

ClueWeb09 coverage

Over the years

Question from 2006 2007 2008 2009 2010 2011 2012

Our dataset 68 176 369 701 578 477 364

Coverage 89.5% 92.2% 86.0% 86.2% 79.6% 77.3% 71.9%

Type of associated URL

95% Wikipedia

5% other

Matthias Hagen Known-Item Search 14

Page 22: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Corpus analysis

Initial observation

Matthias Hagen Known-Item Search 15

Page 23: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

False memories hinder total recall

Matthias Hagen Known-Item Search 16

Page 24: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

False memories in questions

Matthias Hagen Known-Item Search 17

Page 25: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Movie “. . . starts off with a box full of free puppies . . . ”

Question

Actual known item

Note a difference?!

Matthias Hagen Known-Item Search 18

Page 26: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Movie “. . . starts off with a box full of free puppies . . . ”

Question Actual known item

Note a difference?!

Matthias Hagen Known-Item Search 18

Page 27: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

False memories in questions

Matthias Hagen Known-Item Search 19

Page 28: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Movie “. . . Morgan Freeman offers him a job to kill . . . ”

Question

Actual known item

Note a difference?!

Matthias Hagen Known-Item Search 20

Page 29: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Movie “. . . Morgan Freeman offers him a job to kill . . . ”

Question Actual known item

Note a difference?!

Matthias Hagen Known-Item Search 20

Page 30: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Funny! But these are just a few outliers?!

Matthias Hagen Known-Item Search 21

Page 31: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

False memories statistics

At least 240 questions (9% of corpus) contain false memories

Most frequent false memories: Person names!

Remark: Makes me think . . .

Does my mail search take this into account?

Matthias Hagen Known-Item Search 22

Page 32: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

False memories statistics

At least 240 questions (9% of corpus) contain false memories

Most frequent false memories: Person names!

Remark: Makes me think . . .

Does my mail search take this into account?

Matthias Hagen Known-Item Search 22

Page 33: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Potential usage of the corpus

Observation: False memories hinder good results.Might even yield zero-result lists!

Retrieval systems should

Detect false memory situations

“Repair” the query

Leave out the false memory orReplace it with correction

Our corpus might be a starting point in that direction.

Matthias Hagen Known-Item Search 23

Page 34: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Potential usage of the corpus

Observation: False memories hinder good results.Might even yield zero-result lists!

Retrieval systems should

Detect false memory situations

“Repair” the query

Leave out the false memory orReplace it with correction

Our corpus might be a starting point in that direction.

Matthias Hagen Known-Item Search 23

Page 35: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Other fields: False memory implantation

Remark: We are not working on that!

Matthias Hagen Known-Item Search 24

Page 36: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

A little scary?!

Matthias Hagen Known-Item Search 25

Page 37: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Let’s finish the talk in a better mood!

Matthias Hagen Known-Item Search 26

Page 38: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

You know this song?!

Matthias Hagen Known-Item Search 27

Page 39: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

One more hint needed?!

Matthias Hagen Known-Item Search 28

Page 40: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Yes, the Bee Gees!

Ah, ha, ha, ha, steak and a knife, steak and a knife

Matthias Hagen Known-Item Search 29

Page 41: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Some funny false memories really are Mondegreens.

. . . that are misheard lyrics.

Matthias Hagen Known-Item Search 30

Page 42: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Some funny false memories really are Mondegreens.

. . . that are misheard lyrics.

Matthias Hagen Known-Item Search 30

Page 43: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

Almost the end: The take-home messages!

Matthias Hagen Known-Item Search 31

Page 44: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

What we have done

Results

2,755 known-item questions

Posted by real human users

Linked to the ClueWeb09

False memories annotated

Often refer to persons

Or song lyrics

Future Work

Enlarge the corpus

Website known-items esp.

Web queries for the questions

False memory detection

Thank you,

Matthias Hagen Known-Item Search 32

Page 45: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

What we have (not) done

Results

2,755 known-item questions

Posted by real human users

Linked to the ClueWeb09

False memories annotated

Often refer to persons

Or song lyrics

Future Work

Enlarge the corpus

Website known-items esp.

Web queries for the questions

False memory detection

Thank you,

Matthias Hagen Known-Item Search 32

Page 46: Datenanalyse zur Unterstützung der Entscheidungsfindung im Arbeitsalltag

What we have (not) done

Results

2,755 known-item questions

Posted by real human users

Linked to the ClueWeb09

False memories annotated

Often refer to persons

Or song lyrics

Future Work

Enlarge the corpus

Website known-items esp.

Web queries for the questions

False memory detection

Thank you,

Matthias Hagen Known-Item Search 32