Rec: All Lecture Capture Workshop 11 December 2013 Carlos Turró Universitat Politècnica de València EC FP7 ICT project #287755
Dec 05, 2014
Rec: All Lecture Capture Workshop11 December 2013
Carlos TurróUniversitat Politècnica de València EC FP7 ICT project #287755
Motivation
12 Nov 2013 2
• Video lecture repositories and MOOCs• Thousands of hours of video lectures available• Hundreds of hours of video lectures
recorded every week
• Most video lectures only available in their original language• No subtitles
Motivation
12 Nov 2013 3
• Transcriptions and translations are needed• Accessibility for people with disabilities• Accessibility for speakers of different
languages• Search and analysis functions• Automated topic finding• …
Motivation
12 Nov 2013 4
• Transcriptions and translations are needed• Accessibility for people with disabilities• Accessibility for speakers of different
languages• Search and analysis functions• Automated topic finding• …
• How do we get there?
The transLectures approach
12 Nov 2013 5
1. Automatic Speech Recognition (ASR)and Machine Translation (MT)• Adaptation: Taking advantage of the
characteristics of video lecture repositories• High-quality automatic transcriptions and
translations
2. Interactive postediting:intelligent interaction for reduced effort
Goals
12 Nov 2013 6
• Development of an engine for adaptation & Intelligent interaction
• Implementation• Case studies: Videolectures.NET & Polimedia• Real-life evaluation• Integration into Opencast Matterhorn
http://opencast.org/matterhorn/
The transLectures partners
12 Nov 2013 7
Name Country
1 Universitat Politècnica de València Spain2 Xerox SAS France3 Institut Jožef Stefan Slovenia3+ Knowledge for All Foundation UK4 RWTH Aachen University Germany5 EML – European Media Laboratory Germany6 DDS – Deluxe Digital Studios UK
36 Months
Now we are in M25
Statistical Transcription (and translation)
Acustic Model
LanguageModel
TRANSCRIPTION
Sound ASR Engine
Statistical transcription(and translation)
Acustic Model
LanguageModel
Manually transcriptedvoice Modeling Engine
Architecture of TransLectures
Lecture
Language Model
Slides
Extracontent
Result
Intelligent interaction
Transcription Translation
Languages
12 Nov 2013 11
• Transcription (ASR)• EN• SL• ES
• Translation (MT)• EN>SL , SL>EN• EN>ES , ES>EN• EN>FR• EN>DE
Case study: VideoLectures.NET
15000 lectures
Case study: Polimedia
10000 Learning Objects
Demo
http://translectures.videolectures.nethttp://polimedia.upv.es/catalogo
http://translectures.eu/player/
Scientific evaluations
• Transcription results
• WER: Word Error Rate (%)• Goal: WER < 20%
• EN, SL, ES
Worse
12 Nov 2013 15
Better
Scientific evaluations
• Translation results
• BLEU• Goal: BLEU > 30
• EN>SL , SL>EN• EN>ES , ES>EN• EN>FR• EN>DE
Better
12 Nov 2013 16
Worse
Y2 results and comparison
12 Nov 2013 17
Y2 results and comparison
12 Nov 2013 18
Y2 results and comparison
12 Nov 2013 19
Massive adaptation
• Characteristicsof video lectures Just one person
Known speaker
Clear talking
No interruptions
Focused on a topic
Slides
12 Nov 2013 20
Massive adaptation
12 Nov 2013 21
• Known speaker and topic• Slides• Related documents
Intelligent interaction
12 Nov 2013 22
• Postediting automatic transcriptions/translations• The user invests the least possible effort• The system learns the most from it
• Confidence measures• Fast constrained search
Intelligent interaction
12 Nov 2013 23
Intelligent interaction
12 Nov 2013 20
Implementation and integration
12 Nov 2013 25
• Videolectures.NET• Polimedia
• Opencast Matterhorn
Online HTML5 VideoPlayer editor with editing capabilities.The user interface has three different editing layouts, and full keyboard support.User interaction statistics analyzed to improve user experience and develop a user model.
The tL player
tL player
Manual upload of lectures
transLectures: tools available
12 Nov 2013 29
• The transLectures-UPV Toolkit (TLK) for ASR• www.translectures.eu/tlk
• RWTH Aachen: rASR, Jane (MT)• http://www-i6.informatik.rwth-aachen.de/web/Software/
Note that you need an acoustic & language model
transLectures: tools at M30
• The tL player (& editor)• tL Opencast Matterhorn module• Cloud service for testing• Coming soon at M30 (www.translectures.eu)
More info at the OCWC conference
(Ljubljana) in April 2014
Next steps for transLectures
12 Nov 2013 31
• Keep improving ASR and MT results• Keep improving tL open source tools (TLK, tL player)• External user evaluations (VL.NET and polimedia)• External trials: implementation in other universities
Next EU project: EMMA
• MOOC related project
• transLectures work in adding 7 new transciption systems (English, Italian, Spanish, French, Dutch, Portuguese and Estonian)
• … and 8 translation systems (from Italian, Spanish, French, Dutch, Portuguese and Estonian into English; and from English into Italian and Spanish)
• Beginning in 2014
www.translectures.eu
My mail (Carlos Turro)Project coordinator: Alfons Juan-Ciscar
EC FP7 ICT Programme – Project Number 28775512 Nov 2013 33
Thanks!