Top Banner
In two minds: How to teach translation students to learn from parallel corpora Tomaž Erjavec Department of Intelligent Systems Jožef Stefan Institute [email protected] Špela Vintar Department of Translation and Interpreting University of Ljubljana [email protected]
21

In two minds: How to teach translation students to learn from parallel corpora

Jan 14, 2016

Download

Documents

chessa

In two minds: How to teach translation students to learn from parallel corpora. Toma ž Erjavec Department of Intelligent Systems Jožef Stefan Institute [email protected] Špela Vintar Department of Translation and Interpreting University of Ljubljana [email protected]. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: In two minds: How to teach translation students to learn from parallel corpora

In two minds: How to teach translation students to learn from

parallel corpora

Tomaž ErjavecDepartment of Intelligent SystemsJožef Stefan [email protected]

Špela VintarDepartment of Translation and Interpreting

University of [email protected]

Page 2: In two minds: How to teach translation students to learn from parallel corpora

Overview

The corpus and concordancerUsing the resource to teach

students

Page 3: In two minds: How to teach translation students to learn from parallel corpora

The IJS-ELAN parallel corpus

EU MLIS project ELAN: IJSSlovene-English parallel texts1 million words, 15 textssentence aligned, tokenisedTEI encodedfreely available http://nl.ijs.si/elan/

Page 4: In two minds: How to teach translation students to learn from parallel corpora

Example TU

<tu lang="sl-en" id="spor.902"><seg lang="sl"><w type=dig>117.</w> <w>&ccaron;len</w></seg><seg lang="en"><w>Article</w> <w type=dig>117</w></seg></tu>

<tu lang="en-sl" id="gnpo.303"><seg lang="en"><w>Memory</w> <w>exhausted</w></seg><seg lang="sl"><w>zmanjkalo</w> <w>pomnilnika</w></seg></tu>

Page 5: In two minds: How to teach translation students to learn from parallel corpora

Web concordance

IMS CQP backendCGI Perl interfaceApache server

Page 6: In two minds: How to teach translation students to learn from parallel corpora
Page 7: In two minds: How to teach translation students to learn from parallel corpora

Queries

Vanilla queries: dog*, *dogFull regular expressions: “dog.*”Positional attributes: [num=“dual”]Expressions over tokensConstrains on aligned segments

Page 8: In two minds: How to teach translation students to learn from parallel corpora
Page 9: In two minds: How to teach translation students to learn from parallel corpora

Using the corpus in translator training:

Developing corpus literacy

what is a corpus?what’s in the corpus?how to find things in the corpus?how to use the results?

Page 10: In two minds: How to teach translation students to learn from parallel corpora

Formulating corpus queries

learning to formalize languagewordform vs. lemma (Slovene!)using parallel search to filter out

unwanted examples

Page 11: In two minds: How to teach translation students to learn from parallel corpora

Evaluating the results

critical eye: corpus translations may be false or bad

before relying on quantitative data, consider corpus composition

corpus != dictionary

Page 12: In two minds: How to teach translation students to learn from parallel corpora

Types of activities

frontal presentationsgroup workindividual work - translating with the

corpusseminar assignments

Page 13: In two minds: How to teach translation students to learn from parallel corpora

Things to observe

translation (in)equivalence, terminological variety

word-formation strategiespragmatic/cultural conventions of

text typescontrastive analysisother translation strategies

Page 14: In two minds: How to teach translation students to learn from parallel corpora

lokaln* samouprav* ?

kuca: z ustreznim razmerjem med državo in lokalno samoupravo, med središčem države in A society with an appropriate relationship between the state and local government, between the national centre and individual regions.

parl: obstajati. Specifične oblike lokalne samouprave so Slovenci poznali pod imenom župa, Specific forms of local self-administration were known to Slovenes by the term župa, which meant one or more villages led by a župan.

ecmr: reforme javne uprave, razvoj lokalne samouprave, pa tudi oceno kadrovskih potreb in It is therefore an operative document which, apart from strategic goals, defines the areas of reforms, macro - and micro-economic policy measures, development of judicial system, public administration reform, development of local administration, as well as an estimate of the staff and financing requirements for realisation of those reforms. ekol: okolja33. V ta sklop sodi tudi raven lokalne samouprave s svojimi pristojnostmi na področju This also includes the level of local self-government with its responsibilities in the area of environmental protection, which otherwise are dealt with in a special chapter.

Page 15: In two minds: How to teach translation students to learn from parallel corpora

Things to observe

translation (in)equivalence, terminological variety

word-formation strategiespragmatic/cultural conventions of

text typescontrastive analysisother translation strategies

Page 16: In two minds: How to teach translation students to learn from parallel corpora

*bug*

20 bugs13 bug 9 debugging 8 debug 3 buggers 3 bug-free 2 buggy 2 Debugging 1 [email protected] 1 [email protected] 1 debuggers 1 debugger 1 [email protected] 1 [email protected] 1 bug-fixes 1 [email protected]

*hrošč*

11 hroščev 6 hrošču 5 razhroščevanje 5 hroščih 4 hrošče 3 hrošč 2 razhroščevanja 2 razhroščevalnega 2 hrošči 2 Razhroščevanje 1 razhroščujejo 1 razhroščiti 1 razhroščevanju 1 razhroščevalniku 1 razhroščevalniki 1 razhroščevalnik 1 razhroščevalnih 1 razhroščevalne 1 hroščem 1 hroščati 1 hroščat 1 hrošča

Page 17: In two minds: How to teach translation students to learn from parallel corpora

Things to observe

translation (in)equivalence, terminological variety

word-formation strategiespragmatic/cultural conventions of

text typescontrastive analysisother translation strategies

Page 18: In two minds: How to teach translation students to learn from parallel corpora

Ways of translating deontic modality - shallusta: Within its own territory, Slovenia shall protect human rights and fundamental Država na svojem ozemlju varuje človekove pravice in temeljne svoboščine.

usta: 11 The official language of Slovenia shall be Slovenian. In those areas where Uradni jezik v Sloveniji je slovenščina.

spor: This schedule shall provide for a phasing-out Ta razpored mora predvideti postopno opuščanje tako uvedenih carin, s katerim je treba začeti najkasneje dve leti po uvedbi dajatev, in sicer po enakih letnih stopnjah.

orwl: " " Obviously we shall put it off as long as " Nujno jo morava odložiti za tako dolgo, kot moreva. "

kuca: a state which shall be fair to all, Je pa v moči vseh državljank in državljanov, da si ustvarijo tako državo, ki bo pravična do vseh, ne glede na njihove poglede na svet, politično prepričanje ali narodno pripadnost.

kuca: world. Thus we shall create harmony Tako bomo ustvarjali ravnovesje v sebi, z drugimi in z okoljem.

Page 19: In two minds: How to teach translation students to learn from parallel corpora

Things to observe

translation (in)equivalence, terminological variety

word-formation strategiespragmatic/cultural conventions of

text typescontrastive analysisother translation strategies

Page 20: In two minds: How to teach translation students to learn from parallel corpora

Things to observe

translation (in)equivalence, terminological variety

word-formation strategiespragmatic/cultural conventions of

text typescontrastive analysisother translation strategies

Page 21: In two minds: How to teach translation students to learn from parallel corpora

A peek into the log file

~1,900 different queries since 1999 L2 search: prevarication, forfeiture,

runlevel, kernellexical-gap words: bias, retrieve,

prepoznavnost culturally bound words: potica, kozolec(multiword) terms: legira.* (alloy steel)