••• 1 Roberto Cencioni Kimmo Rossi Multilingual Web Theme 5 of the ICT-PSP Workprogramme DG Information Society and Media Unit INFSO.E1 Language Technologies & Machine Translation [email protected] ICT 2008, 26 Nov 08
Mar 27, 2015
••• 1
Roberto CencioniKimmo Rossi
Multilingual WebTheme 5 of the
ICT-PSP Workprogramme
DG Information Society and Media
Unit INFSO.E1Language Technologies& Machine Translation
ICT 2008, 26 Nov 08
••• 2
• Why?
– new online paradigms centred around communication, collaboration, co-creation … but significant language barriers remain
– EU comprises 27 countries & 23 official languages
– single European Information Space – one of the i2010 objectives
– EC communication on Multilingualism (Sept ‘08) calls fora broader policy framework & joint action
• Purpose: support & enhance
interpersonal & business communication
information access & publishing
across languages
Baseline
••• 3
A few facts
• EU official languages: 23 x 22 = 506 pairs– EC MT (Systran core engine) has 18 pairs in operation
& 10 more pairs at prototype stage
– 60+ national, regional & minority languages within the EU
• English accounts for 30% of today’s Web content– 50% in 2000, 35% in 2004
– Arabic, Chinese, Portuguese … growing very fast
• nearly 1,5 billion internet users worldwide (2008)– c 320 million native EN speakers in the world
• basic requirements for the “digital translation market”:– volume– access– personalisation
real quick, real cheap
••• 4
Here we are
• a new unit established in July 2008– Language Technologies & Machine Translation (INFSO.E1)– high expectations vs. low rate of EC S&T activity in the last
few years
• language is everywhere– written & spoken; documents, messages, databases,
webpages, multimedia objects etc; information as well as meta-information
• but our resources are limited, so initial focus on– multilingual technologies, services, applications
• two instruments in 2009:– Research: FP7 ICT, call 4
Objective 2.2 – Language based Interaction– Innovation: CIP ICT-PSP, call 3
Theme 5 – Multilingual Web
• total budget of 40 Meuro
••• 5
Research vs. Innovationdivision of labour
• from
– long term foundational research (FP7)
• through
– applied research & technology development (FP7)
• to
– integration & demonstration (FP7 + PSP)
– infrastructure & resources (FP7 + PSP)
• different scale of ambition (€)
• different degree of maturity (technology service)
• different timescales & partnerships
••• 6
LT Days
14-15 January, 2009
Luxembourg, JMO conference complex
EC presentations, sessions w/ext speakers, proposal clinics, self-presentations & posters
Agenda & registrations:
cordis.europa.eu/fp7/ict/
language-technologies/fp7-call4_en.html
••• 7
Web sources
INFSO.E1 website:
cordis.europa.eu/fp7/ict/language-technologies/..
• FP7-ICT: ../fp7-call4_en.html
• ICT-PSP: ../cip-psp_en.html
– Events & Presentations
– Call guidance notes
– Background material & useful Links …
••• 8
Pre-proposals& Clinics
3 pages max, mail to: [email protected] • describe the problem your proposal addresses, in particular
– specify the intended user profile and related tasks
– describe actual or prospective applications
– detail data sets: source(s), typology, volume
• how will the proposed project contribute to the outcomes and impacts set out in the work programme? – what are the key innovations?
– what will be the main concrete results?
– what public outputs are foreseen?
– what impact do you expect?
• describe the consortium – give partners' names or profiles and the intended skills mix
– indicate the intended instrument (if known)
• indicate the scale of your ambition
– what is the estimated effort (man-months)
– how long will the proposed project last?
– what amount of EU funding are you looking for?
••• 9
ICT-PSP Call
Overview
••• 10
ICT-PSP Call 3,~Feb 09
ICT Policy Support Programme (PSP) within the Competitiveness & Innovation Framework Programme (CIP) (adopted in October 2006)
• geared towards innovation & ICT uptake:
– development of the Single European information space
– strengthening of the internal market for ICT products and services and ICT-based products and services
– stimulation of innovation through the wider adoption of and investment in ICT
• ensure seamless access to ICT-based services
• improve the conditions for the development of digital content, taking into account multilingualism & cultural diversity
Takes over eContentplus activities from Jan 2009
••• 11
• translation & interpretation market (exc. in-house):– c $15 billion; €1.1 billion for EU institutions alone (2006)– est. 300,000 full time salaried translators worldwide
(37% in Europe)• market fragmentation
– big players < 1000 employees– top EU-based translation company posted a revenue of
$175 million in 2006• a good European base
– SDL, Star, RWS, XRX, Euroscript, Logos, Moravia, VistaTEC, Semantix …
– ESTeam, Lucy Software … • a largely untapped potential
– 4x according to some companies
“Europe’s language is Translation”
••• 12
Business world
• new models: Most companies follow the age-old translate-edit-proofread model of translation. Collaborative, web-based technologies allow translation to become more agile, faster, and better with fewer steps (CSA Inc.)
• new markets: Language Weaver is entering the three new strategic markets – Web Content, Business Intelligence and Customer Care – to provide high-volume, high-speed, and accurate automated translation solutions at a price that would have been unfathomable just a few years ago
• new approaches: If you don't see your native language here, you can help Google create it by becoming a volunteer translator. Check out our Google in Your Language program
• and then of course:
Unfortunately for Google as a person with 7 years of translation experience myself I can tell that you will hardly ever find a translatorwho will agree that machine translation can be useful for anything. (a Russian translator)
••• 13
ICT-PSP Call 3,Theme 5:
Multilingual Web
• 3 objectives:
– machine translation for the multilingual Web (pilot projects)
– multilingual Web content management (pilot projects)
– best practices & standards for the multilingual Web (thematic network)
• 14 Meuro in total, around 6 projects
“The duration of the pilot is expected to be 24 to 36 months within which there should be a 12-month operational phase.”
••• 14
ICT-PSP Call 3,Theme 5:
Multilingual Web
• research: no, at least not ICT research …
• development/engineering:
– configuration, optimisation, customisation, integration … of existing (state of the art) methods, tools & services with a view to defining new approaches, offerings & practices
• demonstration:
– innovative combination is key; new business models, processes & services, organisational setups, usability …
– evaluation along user, technical & (socio-)economic dimensions
• problem orientation:
– useful & useable although possibly not perfect;think ROI
••• 15
Scope & defs
• MT as defined in the ICT-PSP workprogramme encompasses
1. fully automatic machine translation, whatever the technology
2. interactive computer-aided translation (eg TM)
3. a suitable combination of 1. and/or 2. with web based
– human translation, proof-reading & post-editingincl. where relevant methods inspired from social networks
– workflow & content management systems, …
• innovative & effective combination of people, processes& technology; the end result is not science, rather
– more and/or better output
– save time
– cut cost
• emphasis on language transfer, from source language to target language(s)
– language input-output (e.g. speech-to-text) is not the focus
– cross-platform, multi-format content access/delivery is key
••• 16
Language coverage
• some of the work is expected to be language independent– flexibility & ease of adaptation to other languages are key factors
– content authoring & management, collaboration & workflow … are language independent anyway
• project outcomes must be validated in 3+ languages– preferably belonging to different linguistic families
• target languages are chosen & justified by the proposers bearing in mind the following priorities (from high to low):1. EU official languages
2. nationally recognised languages
3. regional languages
4. minority languages
• Non-EU world languages linked to global markets & exports can be considered as well– on a proposal by proposal basis
••• 17
Cont’d
• project’s language coverage driven by the need to:– address gaps & overcome barriers e.g. cross-border
communication for less-developed languages, or
– exploit opportunities e.g. address emerging markets & sizeable language communities
• impact is key, so: viability, sustainability, exploitation channels, deployment prospects …
• main findings must be pro-actively disseminated
• some form of public showcase is mandatory
• participants should include– private or public sector content owners & aggregators
– providers of language services, technology suppliers
– (online) communities of interest where relevant
• 6-7 partners/project, up to €2.5 million funding, up to36 months
••• 18
ICT-PSP Call 3exp. Feb 09
3 intertwined objectives:
5.1 machine translation for the multilingual Web (projects)
information access: MT and other multilingual solutions for information access & use, esp. cross-lingual search & retrieval
information publishing: MT to create, distribute and (re-)use more widely & effectively online content in a multilingual environment
5.3 multilingual Web content management (projects)
communication: multilingual Web content development & management; design, authoring, versioning & maintenanceof multilingual Web sites, portals or repositories
5.2 standards & best practices for the multilingual Web (network)
conventions & best practices for multilingual Web content
••• 19
ICT-PSP, 5.3multilingual Web
content management
• methods, techniques, metrics … for developing & managing multilingual web content & services– much more than translation; significant cultural elements
• think of– one big website in many languages, or– several interrelated websites, one country/language each
• now think of how to maintain the integrity & consistency of such resources, effectively & over a long period of time– and how to detect & repair gaps or inconsistencies
• so, beyond the “translation” step (obj 5.1):– design, authoring, versioning & maintenance of (multiple, parallel,
interconnected …) websites, portals or repositories
– in a distributed collaborative environment, possibly across organisational boundaries
• so as to turn a multi-million endeavour into a viable proposition for a much broader range of companies & administrations
••• 20
ICT-PSP, 5.1machine translation for
the multilingual Web
5.1 can be seen as a subset & central component of obj 5.3 (its “translation box”)
• different usages:
– web at large, enterprise, public information repositories …
• different users:
– teams as well as individuals, engineers as well as analysts, sales & marketing, language professionals, … you & me
• different content rich, information bound sectors, private & public
• quality depends on task & user
– from raw translation & “gisting” up to error-free translation
• two important conditions:
– widely recognised, well argued problem; clearly identified target community
– thorough validation in a given domain / for a given task volume metrics
••• 21
ICT-PSP, 5.2standards & best practices
Thematic network
• covers the same broad issues as 5.3
– “the web as THE vehicle for multilingual content & services”
• provides a forum for multilateral exchange of experience & consensus building
• structure & tasks to be defined by the proposers, indicative list:– bring together a meaningful subset of the main stakeholders, possibly
through their own groups & associations– ICT & language industries, content aggregators/distributors, e-services,
multinational agencies, industry & de-jure standards bodies …
– analyse current situation, identify gaps & bottlenecks; assess market failures if any, specify technical & non-technical conditions to be met and the respective actors
– establish roadmap (trends, requirements, dependencies …) for further developments in the coming years
– stimulate consensus & active involvement/coordination; take part in leading conferences, liaise with primary associations etc.
– explore means to promote best practice (conferences, portals, publications, training …) beyond current channels
– identify & describe suitable follow-on actions
••• 22
ICT-PSPInstruments & Funding
• pilot B projects:
– min. 4 partners from 4 different countries
– 50% of eligible direct costs
– 30% overhead rate of personnel costs
• thematic networks:
– min. 7 partners from 7 different countries
– lump sum; for 3 years and 1+10 participants:
coordinator: 95 Keuro
other participants: 24 Keuro each
ec.europa.eu/information_society/activities/ict_psp/participating/index_en.htm
••• 23
Practical info
ICT-PSP Theme 5 – Multilingual Web
budget: 14 Meuro under Call 3
managed by: Unit E1
Email: [email protected]
EC contact: Mr Kimmo Rossi
• inquiries: from the call publication date (~Feb)
• pre-proposals: from publication until 3 weeks before the call closing date
••• 24
Events
Language Technology Days:
14-15 Jan 2009, Luxbg
ICT-PSP Info Day:
26 Jan 2009, Brussels (tbc)
Email: [email protected]
URL: cordis.europa.eu/fp7/ict/language-technologies/..
FP7-ICT: ../fp7-call4_en.html
ICT-PSP: ../cip-psp_en.html