Top Banner
Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker University of Sheffield
21

Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

Browser Evaluation Test

…A Trial Run

Pierre Wellner & Mike Flynn, IDIAPFribourg Nov 26, 2004

Mike Flynn, Pierre Wellner IDIAPSimon Tucker, Steve Whittaker

University of Sheffield

Page 2: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

Outline

• Reminder of BET• Trial Run• Results• Analysis• Future work

Page 3: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

Reminder

• What is a Browser for?

“Browsing a meeting recording is an attempt to find a maximum number of observations of interest in a minimum

amount of time.”

• “Observations of Interest”– Pairs of complementary statements about the meeting– Of interest to… the participants, or to people who missed

the meeting.• Observers

– Unlimited access– No time limit

• actually 4½ x meeting time (on average)• Subjects

– Answer as many Questions as possible– Time limit: ½ meeting time– Questions are observation pairs, without indication

Page 4: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

tests

sampling

tests

sampling

scoringscoring

observationsobservations answersanswers

observers

playbacksystem

observers

playbacksystem

subjects

browser under test

subjects

browser under test

meetingparticipants

corpus

recording system

meetingparticipants

corpus

recording system

scores

The BET The BET ProcessProcess

Page 5: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

Trial Run: Observers

• Needed native English speakers– University of Sheffield– Students, researchers, lecturers

• Meetings 1 x 44 minutes• Observers 6• Observations294 (only 255 used)

Page 6: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

Observer’s Screen Shot

Page 7: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

Observations… about the observations

• Examples:Agnes thinks having the sofa along the whiteboard is a good

idea.Agnes thinks the sofa will be in the way if under the

whiteboard.Martin wants to put the coffee machine along the left wall.Martin wants to put the coffee machine along the right wall.

• Mainly about what was said, not done• Participants names all in top ten words

– Others: the, of, to, at, is, that• 283/294 (83%) use participant by name• Observation density…

Page 8: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

Observation Density Graph

0

2

4

6

8

10

12

14

16

18

20

00:00 10:00 20:00 30:00 40:00 50:00

Media Time

Obs

erva

tions

per

min

ute

Page 9: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

Trial Run: Subjects

• 11f + 13m = 24 total• University of Sheffield• Three conditions:

“Guess” - no media whatsoever“Base” - same media as Observers“F1” - Ferret with Brno ASR transcript +

slides + speaker segmentations

Page 10: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

Guess Condition Screen Shot

Page 11: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

Base Condition Screen Shot

Page 12: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

F1 Condition Screen Shot

Page 13: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

Results: Guess Condition

SubjectAnswers Correct Incorrect ScoreA1 255 142 113 55.7%A2 220 123 97 55.9%A3 135 81 54 60.0%Total 610 346 264 56.7%

Page 14: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

Results: Base Condition

Subject Answers Correct Incorrect ScoreB1 22 14 8 63%B2 25 17 8 68%B3 12 7 5 58%B4 8 8 0 100%B5 5 2 3 40%B6 3 1 2 33%B7 12 8 4 66%B8 5 4 1 80%B9 8 3 5 37%B10 22 12 10 54%B11 4 4 0 100%Base Total 126 80 46 63.5%

Page 15: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

Results: F1 Condition

SubjectAnswersCorrectIncorrect ScoreC1 20 11 9 55%C2 6 3 3 50%C3 18 17 1 94%C4 21 12 9 57%C5 18 11 7 61%C6 11 7 4 63%C7 6 6 0 100%C8 14 10 4 71%C9 12 11 1 91%C10 7 2 5 28%F1 Total 133 90 43 67.7%

Page 16: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

Details

• Scores by time• Media time-difference• Speed versus accuracy

Page 17: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

Results by time, overlaid

0%

10%

20%

30%

40%

50%

60%

70%

01234567891011121314151617181920212223

Time Left

Ave

rage

Sco

re p

er Q

uest

ion

Base Condition

F1 Condition

Guess Condition

Scores by Time

Page 18: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

Media time difference histogram

-5

0

5

10

15

20

-45 -35 -25 -15 -5 5 15 25 35 45

Media Time Difference (minutes)

Num

ber o

f Ans

wer

s

Incorrect Correct

Proximity of Answers to Questions

Page 19: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

Speed versus Accuracy graph

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 5 10 15 20 25 30

Questions Answered

Que

stio

ns C

orre

ct

Base conditionF1 conditionBase meanF1 meanGuess mean

Speed versus Accuracy

Page 20: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

BET scores

Condition Speed AccuracyGuess 27.7 56.7%Base 5.7 63.5%F1 6.0 67.7%

Page 21: Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.

Future work

• AMI recording 100 hour corpus• More observations• More subjects

– reduce confidence interval (~18% wide)

• Design, test & comparebrowser improvements