. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Language technology for morphologically rich languages Language technology for morphologically rich languages Trond Trosterud Giellatekno, Centre for Saami Language Technology http://giellatekno.uit.no/ September 5, 2017
33
Embed
Language technology for morphologically rich languagesmlp.computing.dcu.ie/mlp2017/docs/trosterud.pdf · Language technology for morphologically rich languages A very subjective history
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Language technology for morphologically rich languages
Language technologyfor morphologically rich languages
Trond TrosterudGiellatekno, Centre for Saami Language Technology
http://giellatekno.uit.no/
September 5, 2017
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Language technology for morphologically rich languages
Contents
A very subjective history of language technology
A model for all the other languages
Conclusion
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Language technology for morphologically rich languagesA very subjective history of language technology
A very subjective history of language technology
▶ The computers came with the cold warOur task was to build MT from Russian to English
▶ First attempt (ask the cryptographers):▶ Machine translation seen as a noisy channel?
▶ Second attempt (ask the linguists):▶ Generative grammar promised to ...
generate grammatical sentences
▶ 1966: The Alpac report▶ We (the linguists) had failed
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Language technology for morphologically rich languagesA very subjective history of language technology
Some of the critique is still valid
▶ Bar-Hillel 1960:▶ Little John was looking for his toy box. Finally he found it.
The box was in the pen.▶ Google Translate 2017:
▶ Lille John var på utkikk etter sin leketøyboks. Til slutt fanthan det. Boksen var i pennen.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Language technology for morphologically rich languagesA very subjective history of language technology
The post-Alpac world of formal linguistics 1
▶ Not that much MT for a long while, but:▶ Formal linguistics
▶ Until 1980: Chomskyan generative grammar▶ After 1980: Chomsky went for ”Universal Grammar”
(= left the field of grammar modelling)▶ Alternative generative models (LFG, HPSG)
▶ did not result in robust parsers
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Language technology for morphologically rich languagesA very subjective history of language technology
The post-Alpac world of formal linguistics 2
▶ An alternative approach to morphophonology▶ C. Douglas Johnson 1972:
Formal Aspects of Phonological Descriptionrewrite-rules ( A → B | C _ D ) as finite-state transducers
▶ Kimmo Koskenniemi 1983: Rewrite rules as parallel relations▶ Around 1990: Xerox builds efficient compilers
▶ The word form problem was solved(we will return to the relevance this has fortomorrow’s shared task)
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Language technology for morphologically rich languagesA very subjective history of language technology
In came the nineties
▶ Finally, the linguists had broken the code:we came up with a technology combining robustness anddepth
1. Finite-state transducers had solved analysis / generation2. Constraint grammar solved the homonymy problem
▶ Disambiguating ambiguity in context:John tries to walk the walk==> context-sensitive disambiguation rules(Fred Karlsson, Pasi Tapanainen, Eckhard Bick)
▶ Our moment in the limelight:The British National Corpus was annotated byFinite-state transducers and Constraint Grammar
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Language technology for morphologically rich languagesA very subjective history of language technology
Then two things happened:
1. The inventors of these techniques commercialised themand lifted it out of the common development(thus there were no open compilers or grammars,but grammar checkers for MS Word, annotating gold corporafor statistical models)
2. Computers got faster and the algorithms better
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Language technology for morphologically rich languagesA very subjective history of language technology
Statistical methods won the day
▶ Every time I fire a linguist my system improves▶ Morphology is handled by lists▶ Different types of processing is handled via machine learning
▶ Performance went down, but algorithms were opengood data were closed!
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Language technology for morphologically rich languagesA very subjective history of language technology
A side note on language typology
▶ We know this quote: “Take a language like, say,▶ But languages are not like English
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Language technology for morphologically rich languagesA very subjective history of language technology
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Language technology for morphologically rich languagesA very subjective history of language technology
There is a growing interest in extending the scope oflanguage technology
▶ (cf this workshop)▶ A natural choice (?): extend the model we had for English,
to these other languages▶ So far, not too many success stories on this front
(there are taggers, but not that many end-user applications)▶ No spellchecker for any North American languages▶ Very few languages have grammar checkers▶ Far worse MT into Finnish than into other EU languages▶ Bad MT between, say, Swedish and Norwegian▶ In short, a paucity of working solutions for the majority of
languages
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Language technology for morphologically rich languagesA very subjective history of language technology
Meanwhile, in the grammatical camp:
▶ We have been extending the domain of the rules from thenineties
▶ into a both robust and deep analysis(dependency annotation at > 95%)
▶ and we have got open compilers▶ ... but our time in the limelight is over
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Language technology for morphologically rich languagesA model for all the other languages
So, the limelight is gone, but here I am, on another scene
▶ As witnessed by the growing concern and a growing numberof workshops:The morphologically rich languages are not that easy
▶ Identifying the morphemes is not enough▶ Perhaps we should have a second look at what happened in
the nineties
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Language technology for morphologically rich languagesA model for all the other languages
My answer: A viable model for “all the other languages”
▶ Each language needs a team▶ Programmer (shared)▶ Computational linguist (shared)▶ Linguist▶ ... and eventually a native speaker (and preferably linguist)
▶ Here is the thing:For every language, there is a linguisthaving devoted his or her life to itlanguage technology has something to offer:==> a test bed for his or her grammatical model
▶ Each team would share the common infrastructureThe Linux model, as it were
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Language technology for morphologically rich languagesA model for all the other languages
But can we repeat the Linux model for languagetechnology?
▶ It turns out we can▶ cf. two examples▶ http://giellatekno.uit.no/doc/lang/▶ http://wiki.apertium.org/