LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller ([email protected]), Anne David, Michael Maxwell, Alina Twist, Claudia Brugman, Evelyn Browne, Melissa Fox, Michael Marlo, Paul Rodrigues and Tristan Purvis
23
Embed
LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller ([email protected]),
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Creating a dual-use pandialectal Pashto grammar
AF-PAK LEARN OmahaMay 17, 2010
Corey Miller ([email protected]), Anne David, Michael Maxwell, Alina Twist, Claudia Brugman,
Evelyn Browne, Melissa Fox, Michael Marlo, Paul Rodrigues and Tristan Purvis
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Motivation
• Pashto is an indispensable Afghan language critical to our nation’s security
• Pashto is difficult for English speakers
• Updated, comprehensive, learner-oriented Pashto materials are needed– Grammar
– Easy-access dictionary
2
LANGUAGE RESEARCH IN SERVICE TO THE NATION
What makes Pashto difficult?
• Ergativity
• Up to four cases: direct, oblique, ablative, and vocative
• Multiple noun and adjective declension classes
• Variety of adpositions: prepositions, postpositions, and circumpositions
• Retroflex consonants
• Variety of verbal structures
3
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Project components
4
FieldworkDescriptive Grammar
Dictionary
Formal Grammar
Parser
Parser enables easy access to dictionary
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Fieldwork
• Identified native speakers of Pashto from Afghanistan and Pakistan living in the US– Peshawar, Quetta, Pakistan
– Kabul, Kandahar, Afghanistan
• Create and run elicitation guides highlighting range of grammatical features
• Review all paradigms and example sentences, note dialect variation
• Digitally record all sessions5
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Motivation for descriptive grammar
• Existing materials suffer from liabilities– dated
– cover single dialect• Tegey and Robson 1996: Kabul
• Penzl 1955: Kandahar
• Shafeev 1964: Kandahar
– lack Pashto script (T&R has it)
6
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Goals for descriptive grammar
• Contemporary data and presentation
• Use of Pashto script and transcription throughout
Pronoun paradigm: incorporation of dialect information
10
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Interlinear example sentences
11
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Adjective paradigm
12
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Formal grammar of inflectional affix
13
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Stem allomorphy in nouns
14
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Formal grammar of phonological rule
15
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Morphological parsing
• Inputs– Formal grammar
– Dictionary (Lexicon)
• Output capability– Analysis: given an inflected form, produce
possible headwords
– Generation: given a headword, produce possible inflected forms
16
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Uses of morphological parser
• Analysis capability enables dictionary lookup of inflected forms
• Generation has pedagogical uses including self-testing
17
LANGUAGE RESEARCH IN SERVICE TO THE NATION
How morphological analysis aids lookup
• Inflected forms may differ substantially from citation forms
• Experts can work around this problem, but non-experts often can’t
18
Translation Transcription PashtoI am shooting
wə́�lə́m ولم
I was shooting
wiʃtə́�lə́m ويشتلم
LANGUAGE RESEARCH IN SERVICE TO THE NATION
The parser maps inflected forms to citation forms (headwords)
ويشتل[wishtə́�l] (verb) to shoot
Grammatical info: first person singular present imperfectiveCitation form: ويشتل
What does this Pashto word mean?
ولم
What does this Pashto word mean?
ولم
19
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Conclusion
• Updated descriptive grammar based on fieldwork
• Formal grammar and lexicon feed parser
• Parser enables simplified dictionary lookup
• Faster, more informed processing of Pashto
20
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Conclusion
• Updated descriptive grammar based on fieldwork
• Formal grammar and lexicon feed parser• Parser enables simplified dictionary lookupFaster, more informed processing of
Pashto
21
LANGUAGE RESEARCH IN SERVICE TO THE NATION
References
• David, Anne and Michael Maxwell. 2008. Joint grammar development by linguists and computer scientists. Workshop on NLP for Less Privileged Languages, Third International Joint Conference on Natural Language Processing, Hyderabad, India.
• Maxwell, Michael and Anne David. 2008. Interoperable Grammars. First International Conference on Global Interoperability for Language Resources, Hong Kong.
• Maxwell, Michael. 2010. Standardizaton as a means to Sustainability. LREC (to appear).
22
LANGUAGE RESEARCH IN SERVICE TO THE NATION
References
• Penzl, Herbert. 1955. A Grammar of Pashto. Washington, DC: American Council of Learned Societies.
• Tegey, Habibullah and Barbara Robson. 1996. A Reference Grammar of Pashto. Washington, DC: Center for Applied Linguistics.
• Shafeev, D. A. 1964. A Short Grammatical Outline of Pashto. International Journal of American Linguistics 30.