Top Banner
1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich
15

1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich.

Dec 29, 2015

Download

Documents

Edmund Parks
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich.

1

The Use of Provenance in Information Retrieval

Simone StumpfErin Fitzhenry Tom Dietterich

Page 2: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich.

2

Defining Provenance

To us, provenance concerns:

The origin of content within documents

The relationships between documents

AttachmentSave SaveAs

Page 3: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich.

3

Why focus on Provenance for Information Retrieval?

People remember the relationships between documents!

Episodic vs. Semantic Memory Studies:

Blanc-Brude & Scapin (2007) Gonçalves & Jorge (2004)

No need to formulate keyword queries

Other common document attributes are often inaccurately remembered (Blanc-Brude & Scapin 2007):

Title (20% false recall) Size (53.8% false recall) Time (47.6% false recall)

Page 4: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich.

4

Example Use Case: “Where did I save that again?”

I got an email from Tom…

I saved the attachment…

And I pasted some information from the attachment into a PowerPoint document…

Where did that presentation go??

Page 5: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich.

5

Requirements for Tracking and Visualizing Provenance

Instrument all important document provenance events

Provenance events are NOT automatically captured by Windows

Develop a UI enabling users to locate documents via the provenance relationships they remember

Integrate the UI into the Windows Desktop

Page 6: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich.

6

Capturing Provenance Events with TaskTracer

TaskTracer is a Personal Information Management system

User defines a hierarchy of Projects or Activities

As the user works, TaskTracer automatically tags (according to task/project):

Files Folders Email Messages Email Contacts Web pages

Page 7: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich.

7

Instrumenting TT to Capture Provenance Events

TaskTracer already instruments many desktop events:

Open, Save, SaveAs, Close EmailArrived, Email Open, Email Close Open URL, Close URL, Follow Hyperlink

Idea: Extend existing instrumentation to cover key provenance events

CopyPaste, SaveAs, FileCopy/Rename AttachmentAdd, AttachmentOpen, AttachmentSave,

EmailForward*, EmailReply* FileDownload, FileUpload*

*Coming soon

Page 8: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich.

8

Instrumenting TaskTracer to capture Provenance Events (cont.)

A Provenance Event

“From” Resource “To” Resource

Event_id Event_type Event_time

SaveAs10233 Jan 12

oldFile.docId: 1768etc.

newFile.docId: 1923etc.

Database of document-to-document provenance relationships

Page 9: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich.

9

A tool for visualizing provenance

Developing a User Interface:TaskTrail

User’s Query

Click to ExpandMouse over details

Double-click to open

Page 10: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich.

10

Integrating TaskTrail into the Windows UI

Launch a query by right clicking on an item within

Windows Explorer, Outlook, TaskExplorer

Page 11: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich.

13

Page 12: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich.

14

Research Questions

• Does TaskTrail help users find documents more quickly than other methods?

• How should the provenance graph be laid out?

• What kind of provenance events do users accurately recall?

• How large are the provenance graphs?

• What patterns exist (if any) in terms of the succession of provenance events?

Page 13: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich.

15

User Studies: Formative

• Observational Study (planned)• What provenance-related actions do users

perform? Which of those do they remember?• Observe 12 participants in their workplaces• Record provenance-related actions performed• Interview participants after 1 week to see what they

remember• Free Recall• Cued Recall

• How do users layout their documents according to what they remember?

Page 14: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich.

16

User Studies: Summative

• TaskTrail Study at Intel (in progress)

• 4 participants (so far) are using TaskTracer for at least 1 month each

• Then they will use TaskTrail to locate their own documents

• Measures of success:• Do users locate more documents using TaskTrail?• Do users locate documents more quickly using

TaskTrail?• Do users prefer using TaskTrail?

Page 15: 1 The Use of Provenance in Information Retrieval Simone Stumpf Erin Fitzhenry Tom Dietterich.

17

Provenance-related User Studies are Hard!

Must be done “in the wild”

Involves: Long time-scales, which increase chances that:

Participants will drop out Situation on site will change

Potentially sensitive information Emails to/from users not participating in the study Documents regarding trade secrets

Installation of some event-tracking software Software installation/maintenance can introduce

compatibility, scheduling and other problems