YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

Text vs. Speech A Comparison of Tagging Input Modalities

for Camera Phones

Research & Development

Mauro Cherubini, Xavier Anguera, Nuria Oliver, and Rodrigo de Oliveira

Page 2: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

people do not want to tag their pictures

intro → hypotheses → methodology → results → implications

Page 3: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

research question:

Assuming that users are willing to input at least one tag, which input

modality can help the production and retrieval of the pictures?

intro → hypotheses → methodology → results → implications

Page 4: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

hypothesis 1

Speech is preferred to text as an annotation mechanism on mobile

phones (objective measure)

Support: - Mitchard and Winkles (2002)

intro → hypotheses → methodology → results → implications

Page 5: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

hypothesis 1-bis

Speech annotations are preferred by users even if this means spending more time on the task (subjective measure)

Support: - Perakakis and Potamianos (2008)

intro → hypotheses → methodology → results → implications

Page 6: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

hypothesis 2

The longer the tag the larger the advantage of voice over text for

annotating pictures on mobile phones

Support: - Hauptmann and Rudnicky (1990)

intro → hypotheses → methodology → results → implications

Page 7: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

hypothesis 3

Retrieving pictures on mobile phones with speech is not faster than with text

(objective measure)

Support: - Mills et al. (2000)

intro → hypotheses → methodology → results → implications

Page 8: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

the user study

intro → hypotheses → methodology → results → implications

field study (4 weeks)

controlled experiment

T1 - T2 - T3 - T4

3 experimental conditions: a. Speech only

b. Text only c. Speech and Text

Page 9: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

intro → hypotheses → methodology → results → implications

MAMI

Page 10: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

intro → hypotheses → methodology → results → implications

features of MAMI

•  processing is done entirely on the mobile phone

•  speech is not transcribed

•  to compare the waveforms of the audio tags, MAMI uses algorithm of Dynamic Time Warping

Page 11: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

task 1: remember the tag

intro → hypotheses → methodology → results → implications

stimulus retrieval

Pictures taken during the field trial

Page 12: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

task 2: remember the context

intro → hypotheses → methodology → results → implications

stimulus retrieval

TASK 2 PICTURE 1

three little bushes Garden Tree Stairs

Page 13: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

task 3: remember the picture

intro → hypotheses → methodology → results → implications

stimulus retrieval

Text Audio tags were converted into

textual tags and vice versa

Page 14: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

task 4: remember the sequence

intro → hypotheses → methodology → results → implications

assignment retrieval

TASK 4

Three pictures among the oldest and three pictures among the newest.

Page 15: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

metrics

intro → hypotheses → methodology → results → implications

•  time to completion

•  false positives

•  retrieval errors

Page 16: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

results H1

intro → hypotheses → methodology → results → implications

Page 17: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

results H1-bis

All participants in the BOTH group felt that tagging with text was more effective than tagging with voice.

Voice: 3.33 [0.81], Text: 4.34 [0.81] (Mean [SD]) 1 = completely agree; 5 = completely disagree

intro → hypotheses → methodology → results → implications

Page 18: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

results H2

intro → hypotheses → methodology → results → implications

Page 19: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

results H3

intro → hypotheses → methodology → results → implications

Page 20: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

results H3 - continued

Page 21: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

take away 1: �speech is not a given

the advantage of audio as an input modality for tagging pictures on mobile phones is not a given

why? 1. retrieval precision

2. privacy

intro → hypotheses → methodology → results → implications

Page 22: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

take away 2: �input mistakes

we address text input mistakes immediately. on the contrary mistakes in audio recordings are less

frequently addressed

intro → hypotheses → methodology → results → implications

Page 23: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

take away 3: �memory

speech does not help memorizing the tags

intro → hypotheses → methodology → results → implications

Page 24: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

implication 1:�allow multiple modalities

© Pixar, 2008

intro → hypotheses → methodology → results → implications

Page 25: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

implication 2:�enable audio inspection

intro → hypotheses → methodology → results → implications

Page 26: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

implication 3: �enable modality synesthesia

© Disney, 1940

intro → hypotheses → methodology → results → implications

Page 27: Text versus Speech: A Comparison of Tagging Input Modalities for Camera Phones

end�thanks

[email protected] [email protected]

http://www.i-cherubini.it/mauro/blog/ http://research.tid.es/multimedia/

Research & Development


Related Documents