Top Banner
LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller ([email protected]), Anne David, Michael Maxwell, Alina Twist, Claudia Brugman, Evelyn Browne, Melissa Fox, Michael Marlo, Paul Rodrigues and Tristan Purvis
23

LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller ([email protected]),

Dec 17, 2015

Download

Documents

Wilfrid Lane
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Creating a dual-use pandialectal Pashto grammar

AF-PAK LEARN OmahaMay 17, 2010

Corey Miller ([email protected]), Anne David, Michael Maxwell, Alina Twist, Claudia Brugman,

Evelyn Browne, Melissa Fox, Michael Marlo, Paul Rodrigues and Tristan Purvis

Page 2: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Motivation

• Pashto is an indispensable Afghan language critical to our nation’s security

• Pashto is difficult for English speakers

• Updated, comprehensive, learner-oriented Pashto materials are needed– Grammar

– Easy-access dictionary

2

Page 3: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

What makes Pashto difficult?

• Ergativity

• Up to four cases: direct, oblique, ablative, and vocative

• Multiple noun and adjective declension classes

• Variety of adpositions: prepositions, postpositions, and circumpositions

• Retroflex consonants

• Variety of verbal structures

3

Page 4: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Project components

4

FieldworkDescriptive Grammar

Dictionary

Formal Grammar

Parser

Parser enables easy access to dictionary

Page 5: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Fieldwork

• Identified native speakers of Pashto from Afghanistan and Pakistan living in the US– Peshawar, Quetta, Pakistan

– Kabul, Kandahar, Afghanistan

• Create and run elicitation guides highlighting range of grammatical features

• Review all paradigms and example sentences, note dialect variation

• Digitally record all sessions5

Page 6: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Motivation for descriptive grammar

• Existing materials suffer from liabilities– dated

– cover single dialect• Tegey and Robson 1996: Kabul

• Penzl 1955: Kandahar

• Shafeev 1964: Kandahar

– lack Pashto script (T&R has it)

6

Page 7: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Goals for descriptive grammar

• Contemporary data and presentation

• Use of Pashto script and transcription throughout

• Cover dialect variation wherever it applies

7

Page 8: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Descriptive grammar

• Pashto language, orthography, phonology

• Adpositions• Pronouns• Nouns• Adjectives• Verbs• Dialectology• Miscellaneous

8

Page 9: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Pashto dialects

9

Page 10: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Pronoun paradigm: incorporation of dialect information

10

Page 11: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Interlinear example sentences

11

Page 12: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Adjective paradigm

12

Page 13: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Formal grammar of inflectional affix

13

Page 14: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Stem allomorphy in nouns

14

Page 15: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Formal grammar of phonological rule

15

Page 16: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Morphological parsing

• Inputs– Formal grammar

– Dictionary (Lexicon)

• Output capability– Analysis: given an inflected form, produce

possible headwords

– Generation: given a headword, produce possible inflected forms

16

Page 17: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Uses of morphological parser

• Analysis capability enables dictionary lookup of inflected forms

• Generation has pedagogical uses including self-testing

17

Page 18: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

How morphological analysis aids lookup

• Inflected forms may differ substantially from citation forms

• Experts can work around this problem, but non-experts often can’t

18

Translation Transcription PashtoI am shooting

wə́�lə́m ولم

I was shooting

wiʃtə́�lə́m ويشتلم

Page 19: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

The parser maps inflected forms to citation forms (headwords)

ويشتل[wishtə́�l] (verb) to shoot

Grammatical info: first person singular present imperfectiveCitation form: ويشتل

What does this Pashto word mean?

ولم

What does this Pashto word mean?

ولم

19

Page 20: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Conclusion

• Updated descriptive grammar based on fieldwork

• Formal grammar and lexicon feed parser

• Parser enables simplified dictionary lookup

• Faster, more informed processing of Pashto

20

Page 21: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Conclusion

• Updated descriptive grammar based on fieldwork

• Formal grammar and lexicon feed parser• Parser enables simplified dictionary lookupFaster, more informed processing of

Pashto

21

Page 22: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

References

• David, Anne and Michael Maxwell. 2008. Joint grammar development by linguists and computer scientists. Workshop on NLP for Less Privileged Languages, Third International Joint Conference on Natural Language Processing, Hyderabad, India.

• Maxwell, Michael and Anne David. 2008. Interoperable Grammars. First International Conference on Global Interoperability for Language Resources, Hong Kong.

• Maxwell, Michael. 2010. Standardizaton as a means to Sustainability. LREC (to appear).

22

Page 23: LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller (cmiller@casl.umd.edu),

LANGUAGE RESEARCH IN SERVICE TO THE NATION

References

• Penzl, Herbert. 1955. A Grammar of Pashto. Washington, DC: American Council of Learned Societies.

• Tegey, Habibullah and Barbara Robson. 1996. A Reference Grammar of Pashto. Washington, DC: Center for Applied Linguistics.

• Shafeev, D. A. 1964. A Short Grammatical Outline of Pashto. International Journal of American Linguistics 30.

23