- 1. Creating a Data Collection for Evaluating Rich Speech
RetrievalCreating a Data Collection for Evaluating Rich Speech
Retrieval Maria Eskevich1 , Gareth J.F. Jones1Martha Larson 2 ,
Roeland Ordelman 31 Centre for Digital Video Processing, Centre for
Next Generation LocalisationSchool of Computing, Dublin City
University, Dublin, Ireland2 Delft University of Technology, Delft,
The Netherlands 3 University of Twente, The Netherlands
2. Creating a Data Collection for Evaluating Rich Speech
RetrievalOutlineMediaEval benchmarkMediaEval 2011 Rich Speech
Retrieval TaskWhat is crowdsourcing?Crowdsourcing in Development of
Speech andLanguage ResourcesDevelopment of effective crowdsourcing
taskComments on resultsConclusionFuture Work: Brave New Task at
MediaEval 2012 3. Creating a Data Collection for Evaluating Rich
Speech RetrievalediaEvalMultimedia Evaluation benchmarking
inititative Evaluate new algorithms for multimedia access
andretrieval.Emphasize the multi in multimedia: speech,
audio,visual content, tags, users, context.Innovates new tasks and
techniques focusing on thehuman and social aspects of multimedia
content. 4. Creating a Data Collection for Evaluating Rich Speech
Retrieval ediaEval 2011Rich Speech Retrieval (RSR) Task Task Goal:
Information to be found - combination of required audio and visual
content, and speakers intention 5. Creating a Data Collection for
Evaluating Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval
(RSR) Task Task Goal: Information to be found - combination of
required audio and visual content, and speakers intention 6.
Creating a Data Collection for Evaluating Rich Speech Retrieval
ediaEval 2011Rich Speech Retrieval (RSR) Task Task Goal:
Information to be found - combination of required audio and visual
content, and speakers intention 7. Creating a Data Collection for
Evaluating Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval
(RSR) Task Task Goal: Information to be found - combination of
required audio and visual content, and speakers intentionTranscript
1 Transcript 2 8. Creating a Data Collection for Evaluating Rich
Speech Retrieval ediaEval 2011Rich Speech Retrieval (RSR) Task Task
Goal: Information to be found - combination of required audio and
visual content, and speakers intentionTranscript 1 Transcript
2Meaning 1Meaning 2 9. Creating a Data Collection for Evaluating
Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval (RSR) Task
Task Goal: Information to be found - combination of required audio
and visual content, and speakers intentionTranscript 1=Transcript
2Meaning 1 =Meaning 2 10. Creating a Data Collection for Evaluating
Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval (RSR) Task
Task Goal: Information to be found - combination of required audio
and visual content, and speakers intentionTranscript 1=Transcript
2Meaning 1 =Meaning 2Conventional retrieval 11. Creating a Data
Collection for Evaluating Rich Speech Retrieval ediaEval 2011Rich
Speech Retrieval (RSR) Task Task Goal: Information to be found -
combination of required audio and visual content, and speakers
intentionTranscript 1 = Transcript 2Meaning 1= Meaning 2 12.
Creating a Data Collection for Evaluating Rich Speech Retrieval
ediaEval 2011Rich Speech Retrieval (RSR) Task Task Goal:
Information to be found - combination of required audio and visual
content, and speakers intentionTranscript 1 = Transcript 2Meaning
1= Meaning 2Speech act 1 = Speech act 2 13. Creating a Data
Collection for Evaluating Rich Speech Retrieval ediaEval 2011Rich
Speech Retrieval (RSR) Task Task Goal: Information to be found -
combination of required audio and visual content, and speakers
intentionTranscript 1 = Transcript 2Meaning 1= Meaning 2Speech act
1 = Speech act 2Extended speech retrieval 14. Creating a Data
Collection for Evaluating Rich Speech Retrieval ediaEval 2011Rich
Speech Retrieval (RSR) Task ME10WWW dataset: Videos from Internet
video sharing platform blip.tv (1974 episodes, 350 hours) 15.
Creating a Data Collection for Evaluating Rich Speech Retrieval
ediaEval 2011Rich Speech Retrieval (RSR) Task ME10WWW dataset:
Videos from Internet video sharing platform blip.tv (1974 episodes,
350 hours) Automatic Speech Recognition (ASR) transcript provided
by LIMSI and Vocapia Research 16. Creating a Data Collection for
Evaluating Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval
(RSR) Task ME10WWW dataset: Videos from Internet video sharing
platform blip.tv (1974 episodes, 350 hours) Automatic Speech
Recognition (ASR) transcript provided by LIMSI and Vocapia Research
No queries and relevant items 17. Creating a Data Collection for
Evaluating Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval
(RSR) TaskME10WWW dataset:Videos from Internet video sharing
platform blip.tv(1974 episodes, 350 hours)Automatic Speech
Recognition (ASR) transcript providedby LIMSI and Vocapia
ResearchNo queries and relevant items > Collect for Retrieval
Experiment:user-generated queriesuser-generated relevant items 18.
Creating a Data Collection for Evaluating Rich Speech Retrieval
ediaEval 2011Rich Speech Retrieval (RSR) TaskME10WWW dataset:Videos
from Internet video sharing platform blip.tv(1974 episodes, 350
hours)Automatic Speech Recognition (ASR) transcript providedby
LIMSI and Vocapia ResearchNo queries and relevant items >
Collect for Retrieval Experiment:user-generated
queriesuser-generated relevant items > Collect via crowdsourcing
technology 19. Creating a Data Collection for Evaluating Rich
Speech RetrievalWhat is crowdsourcing?Crowdsourcing is a form of
human computation. Human computation is a method of having people
dothings that we might consider assigning to a computingdevice,
e.g. a language translation task.A crowdsourcing system facilitates
a crowdsourcingprocess. 20. Creating a Data Collection for
Evaluating Rich Speech RetrievalWhat is crowdsourcing?Crowdsourcing
is a form of human computation. Human computation is a method of
having people dothings that we might consider assigning to a
computingdevice, e.g. a language translation task.A crowdsourcing
system facilitates a crowdsourcingprocess.Factors to take into
account: 21. Creating a Data Collection for Evaluating Rich Speech
RetrievalWhat is crowdsourcing?Crowdsourcing is a form of human
computation. Human computation is a method of having people
dothings that we might consider assigning to a computingdevice,
e.g. a language translation task.A crowdsourcing system facilitates
a crowdsourcingprocess.Factors to take into account:Sufcient number
of workers 22. Creating a Data Collection for Evaluating Rich
Speech RetrievalWhat is crowdsourcing?Crowdsourcing is a form of
human computation. Human computation is a method of having people
dothings that we might consider assigning to a computingdevice,
e.g. a language translation task.A crowdsourcing system facilitates
a crowdsourcingprocess.Factors to take into account:Sufcient number
of workersLevel of payment 23. Creating a Data Collection for
Evaluating Rich Speech RetrievalWhat is crowdsourcing?Crowdsourcing
is a form of human computation. Human computation is a method of
having people dothings that we might consider assigning to a
computingdevice, e.g. a language translation task.A crowdsourcing
system facilitates a crowdsourcingprocess.Factors to take into
account:Sufcient number of workersLevel of paymentClear
instructions 24. Creating a Data Collection for Evaluating Rich
Speech RetrievalWhat is crowdsourcing?Crowdsourcing is a form of
human computation. Human computation is a method of having people
dothings that we might consider assigning to a computingdevice,
e.g. a language translation task.A crowdsourcing system facilitates
a crowdsourcingprocess.Factors to take into account:Sufcient number
of workersLevel of paymentClear instructionsPossible cheating 25.
Creating a Data Collection for Evaluating Rich Speech
RetrievalCrowdsourcing in Development of Speech andLanguage
Resources 26. Creating a Data Collection for Evaluating Rich Speech
RetrievalCrowdsourcing in Development of Speech andLanguage
Resources Suitability of crowdsourcing for simple/straightforward
natural language processing tasks: 27. Creating a Data Collection
for Evaluating Rich Speech RetrievalCrowdsourcing in Development of
Speech andLanguage Resources Suitability of crowdsourcing for
simple/straightforward natural language processing tasks: Work by
non-experts crowdsource workers is of similar standard to that
performed by expert workers:translation/translation
assessmenttranscription of native languageword sense
disambiguationtemporal annotation[Snow et al., 2008] 28. Creating a
Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing
in Development of Speech andLanguage Resources Suitability of
crowdsourcing for simple/straightforward natural language
processing tasks: Work by non-experts crowdsource workers is of
similar standard to that performed by expert
workers:translation/translation assessmenttranscription of native
languageword sense disambiguationtemporal annotation[Snow et al.,
2008] Research question at collection creation stage: Can untrained
crowdsource workers undertake extended tasks which require them to
be creative? 29. Creating a Data Collection for Evaluating Rich
Speech RetrievalCrowdsourcing with Amazon Mechanical Turk 30.
Creating a Data Collection for Evaluating Rich Speech
RetrievalCrowdsourcing with Amazon Mechanical Turk Task is referred
to as a Human Intelligence Task or HIT. 31. Creating a Data
Collection for Evaluating Rich Speech RetrievalCrowdsourcing with
Amazon Mechanical Turk Task is referred to as a Human Intelligence
Task or HIT. Crowdsourcing procedure: HIT initiation: Requester
uploads a HIT. 32. Creating a Data Collection for Evaluating Rich
Speech RetrievalCrowdsourcing with Amazon Mechanical Turk Task is
referred to as a Human Intelligence Task or HIT. Crowdsourcing
procedure: HIT initiation: Requester uploads a HIT. Work: Workers
carry out the HIT 33. Creating a Data Collection for Evaluating
Rich Speech RetrievalCrowdsourcing with Amazon Mechanical Turk Task
is referred to as a Human Intelligence Task or HIT. Crowdsourcing
procedure: HIT initiation: Requester uploads a HIT. Work: Workers
carry out the HIT Review: Requester reviews the completed work and
conrms payment to the worker with a previously set payment.
*Requester has an option of paying more (Bonus) 34. Creating a Data
Collection for Evaluating Rich Speech RetrievalInformation expected
from the workerto create a test collection for RSR Task 35.
Creating a Data Collection for Evaluating Rich Speech
RetrievalInformation expected from the workerto create a test
collection for RSR TaskSpeech act type:expressives: apology,
opinionassertives: denitiondirectives: warningcommissives: promise
36. Creating a Data Collection for Evaluating Rich Speech
RetrievalInformation expected from the workerto create a test
collection for RSR TaskSpeech act type:expressives: apology,
opinionassertives: denitiondirectives: warningcommissives:
promiseTime of the labeled speech act: beginning and end 37.
Creating a Data Collection for Evaluating Rich Speech
RetrievalInformation expected from the workerto create a test
collection for RSR TaskSpeech act type:expressives: apology,
opinionassertives: denitiondirectives: warningcommissives:
promiseTime of the labeled speech act: beginning and endAccurate
transcript of the labeled speech act 38. Creating a Data Collection
for Evaluating Rich Speech RetrievalInformation expected from the
workerto create a test collection for RSR TaskSpeech act
type:expressives: apology, opinionassertives: denitiondirectives:
warningcommissives: promiseTime of the labeled speech act:
beginning and endAccurate transcript of the labeled speech
actQueries to rend this speech act:a full sentence querya short web
style query 39. Creating a Data Collection for Evaluating Rich
Speech RetrievalData management for Amazon MTurking ME10WWW videos
vary in length: 40. Creating a Data Collection for Evaluating Rich
Speech RetrievalData management for Amazon MTurking ME10WWW videos
vary in length: > Starting points for longer videos at a
distance of approximately 7 minutes apart are calculated: Data set
EpisodesStarting pointsDev 247 562Test 17273278 41. Creating a Data
Collection for Evaluating Rich Speech RetrievalCrowdsourcing
experiment 42. Creating a Data Collection for Evaluating Rich
Speech RetrievalCrowdsourcing experiment Worker expectations: 43.
Creating a Data Collection for Evaluating Rich Speech
RetrievalCrowdsourcing experiment Worker expectations: 44. Creating
a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing
experiment Worker expectations: Reward vs Work 45. Creating a Data
Collection for Evaluating Rich Speech RetrievalCrowdsourcing
experiment Worker expectations: Reward vs Work Per hour Rate 46.
Creating a Data Collection for Evaluating Rich Speech
RetrievalCrowdsourcing experiment Requester uploads the HIT: Worker
expectations: Reward vs Work Per hour Rate 47. Creating a Data
Collection for Evaluating Rich Speech RetrievalCrowdsourcing
experiment Requester uploads the HIT: Worker expectations: Reward
vs Work Pilot wording Per hour Rate 48. Creating a Data Collection
for Evaluating Rich Speech RetrievalCrowdsourcing experiment
Requester uploads the HIT: Worker expectations: Reward vs Work
Pilot wording Per hour Rate0.11 $ + bonus per speech act type 49.
Creating a Data Collection for Evaluating Rich Speech
RetrievalCrowdsourcing experimentWorkers feedback:Requester uploads
the HIT: Reward is not worth the Work Pilot wording Task is0.11 $ +
bonus per too complicated speech act type 50. Creating a Data
Collection for Evaluating Rich Speech RetrievalCrowdsourcing
experiment Requester updates the HIT:Workers feedback:Rewording
Reward is not worth the Work Task is too complicated 51. Creating a
Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing
experiment Requester updates the HIT:Workers feedback:Rewording
Reward is not worthExamples the Work Task is too complicated 52.
Creating a Data Collection for Evaluating Rich Speech
RetrievalCrowdsourcing experiment Requester updates the HIT:Workers
feedback:Rewording Reward is not worthExamples the Work0.19 $ +
bonus (0-21$) Task is Workers suggest bonus too complicated size
(Mention to be a non-prot organization) 53. Creating a Data
Collection for Evaluating Rich Speech RetrievalCrowdsourcing
experiment Requester updates the HIT: Workers feedback: Reward is
worthRewording the WorkExamples Task is comprehensible0.19 $ +
bonus (0-21$) Workers suggest bonus Workers are size (Mention that
we are anot greedy! non-prot organization) 54. Creating a Data
Collection for Evaluating Rich Speech RetrievalHIT examplePilot:
Please watch the video and nd a short portion of the video (a
segment) that contains an interesting quote. The quote must fall
into one of these six categories 55. Creating a Data Collection for
Evaluating Rich Speech RetrievalHIT examplePilot: Please watch the
video and nd a short portion of the video (a segment) that contains
an interesting quote. The quote must fall into one of these six
categoriesRevised: Imagine that you are watching videos on YouTube.
When you come across something interesting you might want to share
it on Facebook, Twitter or your favorite social network. Now please
watch this video and search for an interesting video segment that
you would like to share with others because it is (an apology, a
denition, an opinion, a promise, a warning). 56. Creating a Data
Collection for Evaluating Rich Speech RetrievalResults:Number of
collected queries per speech act Prices: Dev set: 40 $ per 30
queries Test set: 80 $ per 50 queries 57. Creating a Data
Collection for Evaluating Rich Speech RetrievalResults assessment
58. Creating a Data Collection for Evaluating Rich Speech
RetrievalResults assessment Number of accepted HITs = number of
collected queries 59. Creating a Data Collection for Evaluating
Rich Speech RetrievalResults assessment Number of accepted HITs =
number of collected queries 60. Creating a Data Collection for
Evaluating Rich Speech RetrievalResults assessment Number of
accepted HITs = number of collected queries No overlap of workers
in dev and test sets 61. Creating a Data Collection for Evaluating
Rich Speech RetrievalResults assessment Number of accepted HITs =
number of collected queries No overlap of workers in dev and test
sets Creative work - Creative Cheating: 62. Creating a Data
Collection for Evaluating Rich Speech RetrievalResults assessment
Number of accepted HITs = number of collected queries No overlap of
workers in dev and test sets Creative work - Creative Cheating:
Copy and paste provided examples 63. Creating a Data Collection for
Evaluating Rich Speech RetrievalResults assessment Number of
accepted HITs = number of collected queries No overlap of workers
in dev and test sets Creative work - Creative Cheating:Copy and
paste provided examples > Examples should be pictures, not texts
64. Creating a Data Collection for Evaluating Rich Speech
RetrievalResults assessment Number of accepted HITs = number of
collected queries No overlap of workers in dev and test sets
Creative work - Creative Cheating:Copy and paste provided examples
> Examples should be pictures, not textsChoose the option of no
speech act found in the video 65. Creating a Data Collection for
Evaluating Rich Speech RetrievalResults assessment Number of
accepted HITs = number of collected queries No overlap of workers
in dev and test sets Creative work - Creative Cheating:Copy and
paste provided examples > Examples should be pictures, not
textsChoose the option of no speech act found in the video >
Manual assessment by requester needed 66. Creating a Data
Collection for Evaluating Rich Speech RetrievalResults assessment
Number of accepted HITs = number of collected queries No overlap of
workers in dev and test sets Creative work - Creative Cheating:
Copy and paste provided examples > Examples should be pictures,
not texts Choose the option of no speech act found in the video
> Manual assessment by requester neededWorkers rarely nd
noteworthy content later than the third minute from the start of
playback point in the video 67. Creating a Data Collection for
Evaluating Rich Speech RetrievalConclusionsIt is possible to
crowdsource extensive and complextasks to support speech and
language resources 68. Creating a Data Collection for Evaluating
Rich Speech RetrievalConclusionsIt is possible to crowdsource
extensive and complextasks to support speech and language
resourcesUse concepts and vocabulary familiar to the workers 69.
Creating a Data Collection for Evaluating Rich Speech
RetrievalConclusionsIt is possible to crowdsource extensive and
complextasks to support speech and language resourcesUse concepts
and vocabulary familiar to the workersPay attention to technical
issues of watching the video 70. Creating a Data Collection for
Evaluating Rich Speech RetrievalConclusionsIt is possible to
crowdsource extensive and complextasks to support speech and
language resourcesUse concepts and vocabulary familiar to the
workersPay attention to technical issues of watching the videoVideo
preprocessing into smaller segments 71. Creating a Data Collection
for Evaluating Rich Speech RetrievalConclusionsIt is possible to
crowdsource extensive and complextasks to support speech and
language resourcesUse concepts and vocabulary familiar to the
workersPay attention to technical issues of watching the videoVideo
preprocessing into smaller segmentsCreative work demands higher
reward level, or justmore exible system 72. Creating a Data
Collection for Evaluating Rich Speech RetrievalConclusionsIt is
possible to crowdsource extensive and complextasks to support
speech and language resourcesUse concepts and vocabulary familiar
to the workersPay attention to technical issues of watching the
videoVideo preprocessing into smaller segmentsCreative work demands
higher reward level, or justmore exible systemHigh level of wastage
due to task complexity 73. Creating a Data Collection for
Evaluating Rich Speech RetrievalediaEval 2012 Brave New Task:Search
and HyperlinkingUse Scenario: a user is searching for a known
segment in a video collection. Furthermore, because the information
in the segment might not be sufcient for his information need, s/he
wants to have links to other related video segments, which may help
to satisfy information need related to this video. 74. Creating a
Data Collection for Evaluating Rich Speech RetrievalediaEval 2012
Brave New Task:Search and HyperlinkingUse Scenario: a user is
searching for a known segment in a video collection. Furthermore,
because the information in the segment might not be sufcient for
his information need, s/he wants to have links to other related
video segments, which may help to satisfy information need related
to this video. Sub-tasks: 75. Creating a Data Collection for
Evaluating Rich Speech RetrievalediaEval 2012 Brave New Task:Search
and HyperlinkingUse Scenario: a user is searching for a known
segment in a video collection. Furthermore, because the information
in the segment might not be sufcient for his information need, s/he
wants to have links to other related video segments, which may help
to satisfy information need related to this video. Sub-tasks:
Search: nding suitable video segments based on a short natural
language query, 76. Creating a Data Collection for Evaluating Rich
Speech RetrievalediaEval 2012 Brave New Task:Search and
HyperlinkingUse Scenario: a user is searching for a known segment
in a video collection. Furthermore, because the information in the
segment might not be sufcient for his information need, s/he wants
to have links to other related video segments, which may help to
satisfy information need related to this video. Sub-tasks: Search:
nding suitable video segments based on a short natural language
query, Linking: dening links to other relevant video segments in
the collection. 77. Creating a Data Collection for Evaluating Rich
Speech RetrievalediaEval 2012 Thank you for your attention! Welcome
to MediaEval 2012! http://multimediaeval.org